SMM634 Group Assignment

Actuarial Science: Group 4

Ardi Wira Sudarmo

Basmah Khan

Benjamin Evans

29 October 2025

1 Key Information

1.1 Background

This R Markdown document was created as part of a group assignment for SMM634 at Bayes Business School, City St George’s, University of London in Term 1 2025-26.

  • BE note: use rmarkdown::render("MSc_AS-SMM634-Group4-Project.rmd", output_dir = "docs") to render this document

1.2 An interactive pricing model?

An experimental part of this work involved creating an interactive pricing model. Please see this link for more information (it may take a few seconds to load the R shiny web app): Interactive Price Prediction

  • Future development planned to include the ability to switch between different models in this interface to see how each offers a different prediction for the price of a car in 1985
  • BE note: currently this doesn’t have any data validation (can put in negative values - this is interesting to play around with so haven’t added data entry checks for now)

2 Initial processing

## dependencies / external librarys
library(dplyr)
library(ggplot2)
library(patchwork) #for side by side ggplots
library(car)
library(MASS)
library(jtools)

dir.create("fig", showWarnings = FALSE)
knitr::opts_chunk$set(
  fig.path   = "fig/",
  dpi        = 300,
  fig.width  = 6,
  fig.height = 4,
  dev        = "png"
)
knitr::opts_chunk$set(dev = "svglite")

2.1 Read Data

rm(list = ls())
# CarDataRead.R

# source(file.path(".", "CarDataRead.R"))
df <- read.csv("car_price.csv") |>
  mutate(fsymboling = as.factor(symboling)) |>
  mutate(safety = case_match(
    symboling,
    c(-2, -1) ~ "<-1",
    0 ~ "0",
    1 ~ "1",
    2 ~ "2",
    3 ~ "3"
  )) |>
  mutate(safetyIncr = case_match(
    symboling,
    c(-2, -1) ~ "4",
    0 ~ "3",
    1 ~ "2",
    2 ~ "1",
    3 ~ "0"
  )) |>
  mutate(safetyIncr2 = case_match(
    symboling,
    0 ~ "Base",
    c(-2, -1) ~ "Safer",
    c(1, 2, 3) ~ "Riskier"
  )) |>
  mutate(cylinderNum = case_match(cylindernumber,
    c("two", "three") ~ "leq_three",
    c("eight", "twelve") ~ "geq_eight",
    .default = cylindernumber
  ))

table(df$cylinderNum)
## 
##      five      four geq_eight leq_three       six 
##        11       159         6         5        24

2.1.1 Clean data

Want to check for any typos, missing data, or NaN values.

2.1.2 Car manufacturer

Extracting car manufacturer from first part of CarName and correcting spelling errors.

# extract car manufacturer from first part of CarName
df$carManufacturer <- sapply(strsplit(df$CarName, " +"), `[`, 1)
# print
table(df$carManufacturer)
## 
## alfa-romero        audi         bmw       buick   chevrolet       dodge       honda       isuzu      jaguar 
##           3           7           8           8           3           9          13           4           3 
##       maxda       mazda     mercury  mitsubishi      nissan      Nissan     peugeot    plymouth    porcshce 
##           2          15           1          13          17           1          11           7           1 
##     porsche     renault        saab      subaru      toyota     toyouta   vokswagen  volkswagen       volvo 
##           4           2           6          12          31           1           1           9          11 
##          vw 
##           2

Our assumptions are as follows:

Typo / Mis-spelling Correct Spelling
maxda mazda
Nissan nissan
porcshce porsche
toyouta toyota
vokswagen volkswagen
vw volkswagen
df <- df %>% mutate(carManufacturer = case_match(carManufacturer,
  "maxda" ~ "mazda",
  "Nissan" ~ "nissan",
  "porcshce" ~ "porsche",
  "toyouta" ~ "toyota",
  c("vokswagen", "vw") ~ "volkswagen",
  .default = carManufacturer
))
table(df$carManufacturer)
## 
## alfa-romero        audi         bmw       buick   chevrolet       dodge       honda       isuzu      jaguar 
##           3           7           8           8           3           9          13           4           3 
##       mazda     mercury  mitsubishi      nissan     peugeot    plymouth     porsche     renault        saab 
##          17           1          13          18          11           7           5           2           6 
##      subaru      toyota  volkswagen       volvo 
##          12          32          12          11

2.1.3 NA value check

# check individual columns for NaN values
colSums(is.na(df))
##           car_ID        symboling          CarName         fueltype       aspiration       doornumber 
##                0                0                0                0                0                0 
##          carbody       drivewheel   enginelocation        wheelbase        carlength         carwidth 
##                0                0                0                0                0                0 
##        carheight       curbweight       enginetype   cylindernumber       enginesize       fuelsystem 
##                0                0                0                0                0                0 
##        boreratio           stroke compressionratio       horsepower          peakrpm          citympg 
##                0                0                0                0                0                0 
##       highwaympg            price       fsymboling           safety       safetyIncr      safetyIncr2 
##                0                0                0                0                0                0 
##      cylinderNum  carManufacturer 
##                0                0
# check total number of NaN values
cat(c("Total number of NaN values:", sum(colSums(is.na(df)))))
## Total number of NaN values: 0

Looks good - no NaN values.

2.2 Data examination & visualisation

2.2.1 Overview

summary(df)
##      car_ID      symboling         CarName            fueltype          aspiration         doornumber       
##  Min.   :  1   Min.   :-2.0000   Length:205         Length:205         Length:205         Length:205        
##  1st Qu.: 52   1st Qu.: 0.0000   Class :character   Class :character   Class :character   Class :character  
##  Median :103   Median : 1.0000   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :103   Mean   : 0.8341                                                                              
##  3rd Qu.:154   3rd Qu.: 2.0000                                                                              
##  Max.   :205   Max.   : 3.0000                                                                              
##    carbody           drivewheel        enginelocation       wheelbase        carlength        carwidth    
##  Length:205         Length:205         Length:205         Min.   : 86.60   Min.   :141.1   Min.   :60.30  
##  Class :character   Class :character   Class :character   1st Qu.: 94.50   1st Qu.:166.3   1st Qu.:64.10  
##  Mode  :character   Mode  :character   Mode  :character   Median : 97.00   Median :173.2   Median :65.50  
##                                                           Mean   : 98.76   Mean   :174.0   Mean   :65.91  
##                                                           3rd Qu.:102.40   3rd Qu.:183.1   3rd Qu.:66.90  
##                                                           Max.   :120.90   Max.   :208.1   Max.   :72.30  
##    carheight       curbweight    enginetype        cylindernumber       enginesize     fuelsystem       
##  Min.   :47.80   Min.   :1488   Length:205         Length:205         Min.   : 61.0   Length:205        
##  1st Qu.:52.00   1st Qu.:2145   Class :character   Class :character   1st Qu.: 97.0   Class :character  
##  Median :54.10   Median :2414   Mode  :character   Mode  :character   Median :120.0   Mode  :character  
##  Mean   :53.72   Mean   :2556                                         Mean   :126.9                     
##  3rd Qu.:55.50   3rd Qu.:2935                                         3rd Qu.:141.0                     
##  Max.   :59.80   Max.   :4066                                         Max.   :326.0                     
##    boreratio        stroke      compressionratio   horsepower       peakrpm        citympg        highwaympg   
##  Min.   :2.54   Min.   :2.070   Min.   : 7.00    Min.   : 48.0   Min.   :4150   Min.   :13.00   Min.   :16.00  
##  1st Qu.:3.15   1st Qu.:3.110   1st Qu.: 8.60    1st Qu.: 70.0   1st Qu.:4800   1st Qu.:19.00   1st Qu.:25.00  
##  Median :3.31   Median :3.290   Median : 9.00    Median : 95.0   Median :5200   Median :24.00   Median :30.00  
##  Mean   :3.33   Mean   :3.255   Mean   :10.14    Mean   :104.1   Mean   :5125   Mean   :25.22   Mean   :30.75  
##  3rd Qu.:3.58   3rd Qu.:3.410   3rd Qu.: 9.40    3rd Qu.:116.0   3rd Qu.:5500   3rd Qu.:30.00   3rd Qu.:34.00  
##  Max.   :3.94   Max.   :4.170   Max.   :23.00    Max.   :288.0   Max.   :6600   Max.   :49.00   Max.   :54.00  
##      price       fsymboling    safety           safetyIncr        safetyIncr2        cylinderNum       
##  Min.   : 5118   -2: 3      Length:205         Length:205         Length:205         Length:205        
##  1st Qu.: 7788   -1:22      Class :character   Class :character   Class :character   Class :character  
##  Median :10295   0 :67      Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :13277   1 :54                                                                                 
##  3rd Qu.:16503   2 :32                                                                                 
##  Max.   :45400   3 :27                                                                                 
##  carManufacturer   
##  Length:205        
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
str(df)
## 'data.frame':    205 obs. of  32 variables:
##  $ car_ID          : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ symboling       : int  3 3 1 2 2 2 1 1 1 0 ...
##  $ CarName         : chr  "alfa-romero giulia" "alfa-romero stelvio" "alfa-romero Quadrifoglio" "audi 100 ls" ...
##  $ fueltype        : chr  "gas" "gas" "gas" "gas" ...
##  $ aspiration      : chr  "std" "std" "std" "std" ...
##  $ doornumber      : chr  "two" "two" "two" "four" ...
##  $ carbody         : chr  "convertible" "convertible" "hatchback" "sedan" ...
##  $ drivewheel      : chr  "rwd" "rwd" "rwd" "fwd" ...
##  $ enginelocation  : chr  "front" "front" "front" "front" ...
##  $ wheelbase       : num  88.6 88.6 94.5 99.8 99.4 ...
##  $ carlength       : num  169 169 171 177 177 ...
##  $ carwidth        : num  64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
##  $ carheight       : num  48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
##  $ curbweight      : int  2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
##  $ enginetype      : chr  "dohc" "dohc" "ohcv" "ohc" ...
##  $ cylindernumber  : chr  "four" "four" "six" "four" ...
##  $ enginesize      : int  130 130 152 109 136 136 136 136 131 131 ...
##  $ fuelsystem      : chr  "mpfi" "mpfi" "mpfi" "mpfi" ...
##  $ boreratio       : num  3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
##  $ stroke          : num  2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
##  $ compressionratio: num  9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
##  $ horsepower      : int  111 111 154 102 115 110 110 110 140 160 ...
##  $ peakrpm         : int  5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
##  $ citympg         : int  21 21 19 24 18 19 19 19 17 16 ...
##  $ highwaympg      : int  27 27 26 30 22 25 25 25 20 22 ...
##  $ price           : num  13495 16500 16500 13950 17450 ...
##  $ fsymboling      : Factor w/ 6 levels "-2","-1","0",..: 6 6 4 5 5 5 4 4 4 3 ...
##  $ safety          : chr  "3" "3" "1" "2" ...
##  $ safetyIncr      : chr  "0" "0" "2" "1" ...
##  $ safetyIncr2     : chr  "Riskier" "Riskier" "Riskier" "Riskier" ...
##  $ cylinderNum     : chr  "four" "four" "six" "four" ...
##  $ carManufacturer : chr  "alfa-romero" "alfa-romero" "alfa-romero" "audi" ...

2.2.2 Factor variables: General

table(df$symboling)
## 
## -2 -1  0  1  2  3 
##  3 22 67 54 32 27
table(df$safety)
## 
## <-1   0   1   2   3 
##  25  67  54  32  27
table(df$safetyIncr)
## 
##  0  1  2  3  4 
## 27 32 54 67 25
table(df$cylindernumber)
## 
##  eight   five   four    six  three twelve    two 
##      5     11    159     24      1      1      4
table(df$cylinderNum)
## 
##      five      four geq_eight leq_three       six 
##        11       159         6         5        24
# factor_cols <- c("symboling", "safety", "safetyIncr", "cylindernumber", "cylinderNum","carbody")
# examine categorical fields
factor_cols<- c(
  "symboling",
  "safety",
  "safetyIncr",
  "safetyIncr2",
  "cylindernumber",
  "cylinderNum",
  "carbody",
  "fueltype",
  "aspiration",
  "doornumber",
  "carbody",
  "drivewheel",
  "enginelocation", # BE note: 202:3 split so don't include in model
  "enginetype",
  "cylindernumber",
  "fuelsystem",
  "carManufacturer"
)

cols_use <- unique(intersect(factor_cols, names(df)))
fc_missing <- setdiff(unique(factor_cols), names(df))
if (length(fc_missing)) {
  message(
    "Missing in df (skipped): ",
    paste(fc_missing, collapse = ", ")
  )
}
tab_list <- setNames(
  lapply(cols_use, function(v) table(df[[v]], useNA = "ifany")),
  # lapply(cols_use, function(v) prop.table(table(df[[v]], useNA = "ifany"))),
  cols_use
)
for (nm in names(tab_list)) {
  cat("\n==== ", nm, " ====\n", sep = "")
  print(tab_list[[nm]])
}
## 
## ==== symboling ====
## 
## -2 -1  0  1  2  3 
##  3 22 67 54 32 27 
## 
## ==== safety ====
## 
## <-1   0   1   2   3 
##  25  67  54  32  27 
## 
## ==== safetyIncr ====
## 
##  0  1  2  3  4 
## 27 32 54 67 25 
## 
## ==== safetyIncr2 ====
## 
##    Base Riskier   Safer 
##      67     113      25 
## 
## ==== cylindernumber ====
## 
##  eight   five   four    six  three twelve    two 
##      5     11    159     24      1      1      4 
## 
## ==== cylinderNum ====
## 
##      five      four geq_eight leq_three       six 
##        11       159         6         5        24 
## 
## ==== carbody ====
## 
## convertible     hardtop   hatchback       sedan       wagon 
##           6           8          70          96          25 
## 
## ==== fueltype ====
## 
## diesel    gas 
##     20    185 
## 
## ==== aspiration ====
## 
##   std turbo 
##   168    37 
## 
## ==== doornumber ====
## 
## four  two 
##  115   90 
## 
## ==== drivewheel ====
## 
## 4wd fwd rwd 
##   9 120  76 
## 
## ==== enginelocation ====
## 
## front  rear 
##   202     3 
## 
## ==== enginetype ====
## 
##  dohc dohcv     l   ohc  ohcf  ohcv rotor 
##    12     1    12   148    15    13     4 
## 
## ==== fuelsystem ====
## 
## 1bbl 2bbl 4bbl  idi  mfi mpfi spdi spfi 
##   11   66    3   20    1   94    9    1 
## 
## ==== carManufacturer ====
## 
## alfa-romero        audi         bmw       buick   chevrolet       dodge       honda       isuzu      jaguar 
##           3           7           8           8           3           9          13           4           3 
##       mazda     mercury  mitsubishi      nissan     peugeot    plymouth     porsche     renault        saab 
##          17           1          13          18          11           7           5           2           6 
##      subaru      toyota  volkswagen       volvo 
##          12          32          12          11
df <- df %>%
  mutate(across(all_of(factor_cols), as.factor))

2.2.3 Factor variables: check stored as factors in R

str(df)
## 'data.frame':    205 obs. of  32 variables:
##  $ car_ID          : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ symboling       : Factor w/ 6 levels "-2","-1","0",..: 6 6 4 5 5 5 4 4 4 3 ...
##  $ CarName         : chr  "alfa-romero giulia" "alfa-romero stelvio" "alfa-romero Quadrifoglio" "audi 100 ls" ...
##  $ fueltype        : Factor w/ 2 levels "diesel","gas": 2 2 2 2 2 2 2 2 2 2 ...
##  $ aspiration      : Factor w/ 2 levels "std","turbo": 1 1 1 1 1 1 1 1 2 2 ...
##  $ doornumber      : Factor w/ 2 levels "four","two": 2 2 2 1 1 2 1 1 1 2 ...
##  $ carbody         : Factor w/ 5 levels "convertible",..: 1 1 3 4 4 4 4 5 4 3 ...
##  $ drivewheel      : Factor w/ 3 levels "4wd","fwd","rwd": 3 3 3 2 1 2 2 2 2 1 ...
##  $ enginelocation  : Factor w/ 2 levels "front","rear": 1 1 1 1 1 1 1 1 1 1 ...
##  $ wheelbase       : num  88.6 88.6 94.5 99.8 99.4 ...
##  $ carlength       : num  169 169 171 177 177 ...
##  $ carwidth        : num  64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
##  $ carheight       : num  48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
##  $ curbweight      : int  2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
##  $ enginetype      : Factor w/ 7 levels "dohc","dohcv",..: 1 1 6 4 4 4 4 4 4 4 ...
##  $ cylindernumber  : Factor w/ 7 levels "eight","five",..: 3 3 4 3 2 2 2 2 2 2 ...
##  $ enginesize      : int  130 130 152 109 136 136 136 136 131 131 ...
##  $ fuelsystem      : Factor w/ 8 levels "1bbl","2bbl",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ boreratio       : num  3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
##  $ stroke          : num  2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
##  $ compressionratio: num  9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
##  $ horsepower      : int  111 111 154 102 115 110 110 110 140 160 ...
##  $ peakrpm         : int  5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
##  $ citympg         : int  21 21 19 24 18 19 19 19 17 16 ...
##  $ highwaympg      : int  27 27 26 30 22 25 25 25 20 22 ...
##  $ price           : num  13495 16500 16500 13950 17450 ...
##  $ fsymboling      : Factor w/ 6 levels "-2","-1","0",..: 6 6 4 5 5 5 4 4 4 3 ...
##  $ safety          : Factor w/ 5 levels "<-1","0","1",..: 5 5 3 4 4 4 3 3 3 2 ...
##  $ safetyIncr      : Factor w/ 5 levels "0","1","2","3",..: 1 1 3 2 2 2 3 3 3 4 ...
##  $ safetyIncr2     : Factor w/ 3 levels "Base","Riskier",..: 2 2 2 2 2 2 2 2 2 1 ...
##  $ cylinderNum     : Factor w/ 5 levels "five","four",..: 2 2 5 2 1 1 1 1 1 1 ...
##  $ carManufacturer : Factor w/ 22 levels "alfa-romero",..: 1 1 1 2 2 2 2 2 2 2 ...

2.2.4 Visualisation

Have chosen price as response variable. Want to see how it’s distributed… (might want to use log(price) as used in Intro to Python).

p1 <- ggplot(df, aes(x = price)) +
  geom_histogram(bins = 30, fill = "blue", alpha = 0.7)
  # xlab("Count") +
  # ylab("Price [$]")
  # labs(title = "Distribution of Car Prices", x = "price", y = "count")

p2 <- ggplot(df, aes(x = log(price))) +
  geom_histogram(bins = 30, fill = "red", alpha = 0.7)
  # xlab("Count") +
  # ylab("log(Price) [log($)]")
  # labs(title = "Distribution of Log(Price)", x = "log(price)", y = "count")

combo <- p1 | p2
combo

ggsave("fig/price_hist.png",     plot = p1, width = 6, height = 4, units = "in", dpi = 300)
ggsave("fig/logprice_hist.png",  plot = p2, width = 6, height = 4, units = "in", dpi = 300)

ggsave("fig/price_vs_logprice.png", plot = combo, width = 12, height = 4, units = "in", dpi = 300)

Look at numerical variables vs price

# price vs horsepower
ggplot(df, aes(x = horsepower, y = price)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  ggtitle("Price vs Horsepower") +
  theme_minimal()

# Price vs engine size
ggplot(df, aes(x = enginesize, y = price)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  ggtitle("Price vs Engine Size") +
  theme_minimal()

# Price vs highwaympg (could also use citympg)
ggplot(df, aes(x = highwaympg, y = price)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  ggtitle("Price vs MPG (Highway)") +
  theme_minimal()

# Price vs citympg (note: city vs highwaympg are likely to be correlated)
ggplot(df, aes(x = citympg, y = price)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  ggtitle("Price vs MPG (City)") +
  theme_minimal()

This is only a quick look, but clear issues with negative price assuming simple linear regression > ~40 mpg for both city and highway mpg

# Price vs fuel type

ggplot(df, aes(x = fueltype, y = price)) +
  geom_boxplot() +
  ggtitle("Price
Distribution by Fuel Type") +
  theme_minimal()

# Price vs car body

ggplot(df, aes(x = carbody, y = price)) +
  geom_boxplot() +
  ggtitle("Price
Distribution by Car Body") +
  theme_minimal()

# Price vs (new) manufacturer

p3 <- ggplot(df, aes(x = carManufacturer, y = price)) +
  geom_boxplot() +
  # ggtitle("Price Distribution by Manufacturer") +
  xlab("Car Manufacturer") +
  ylab("Price [$]") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

p4 <- ggplot(df, aes(x = carManufacturer, y = log(price))) +
  geom_boxplot() +
  # ggtitle("Price Distribution by Manufacturer") +
  xlab("Car Manufacturer") +
  ylab("log(Price) [log($)]") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
ggsave("fig/price_manufacturer_box.png",     plot = p3, width = 6, height = 4, units = "in", dpi = 300)
ggsave("fig/logprice_manufacturer_box.png",  plot = p4, width = 6, height = 4, units = "in", dpi = 300)

This is not unexpected, but is quite interesting to see. Some manufacturers are more expensive (& the range of models produced is also variable).

Based upon our personal investigation & search we think the data we have been given here are a subset of the data from 1985 Ward’s Automotive Yearbook (with a more comprehensive data set being available from https://archive.ics.uci.edu/dataset/10/automobile).

Note, this doesn’t consider companies that own other companies (i.e., Toyota currently owns Lexus, but both could be considered to be toyota despite both having different price points. This consideration is something we decided was beyond the scope of this analysis).

3 Model creation

We have selected to use log(highwaympg / price) and log(price) as our response variables.

3.1 Regressor selection

Table. Regressors initially selected for our model, the data type (categorical, numerical (continuous), or numerical (discrete)), along with the inclusion rationale. The categorical data were not broken down further into nominal, ordinal, and binary. Denotes a custom factor (see § Extract Car Manufacturer Info).

Variable Type Inclusion rationale
aspiration Categorical Engine performance
carbody Categorical Car class and shape
carheight Numeric (continuous) Car size / weight
carManufacturer* Categorical Brand and reputation
compressionratio Numeric (continuous) Engine performance
curbweight Numeric (discrete) Car size / weight
cylinderNum Categorical Engine performance
carwidth Numeric (continuous) Car size / weight
enginelocation Categorical Car class and shape / Engine performance
enginetype Categorical Engine performance
enginesize Numeric (discrete) Engine performance
fuelsystem Categorical Engine performance
peakrpm Numeric (discrete) Engine performance
stroke Numeric (continuous) Engine performance (linked to torque)
wheelbase Numeric (continuous) Car size (also linked to ride quality)
horsepower Numeric (discrete) Engine performance

3.2 MPG-per-Dollar model

lmpg1 <- lm(
  log(highwaympg / price) ~ aspiration + carbody + carheight + carManufacturer +
    compressionratio + curbweight + cylinderNum + carwidth + enginelocation +
    enginetype + enginesize + fuelsystem + peakrpm + stroke + wheelbase +
    horsepower,
  data = df
)

print(summary(lmpg1))
## 
## Call:
## lm(formula = log(highwaympg/price) ~ aspiration + carbody + carheight + 
##     carManufacturer + compressionratio + curbweight + cylinderNum + 
##     carwidth + enginelocation + enginetype + enginesize + fuelsystem + 
##     peakrpm + stroke + wheelbase + horsepower, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30655 -0.07567 -0.00100  0.07592  0.32212 
## 
## Coefficients: (2 not defined because of singularities)
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -3.167e+00  9.935e-01  -3.188 0.001738 ** 
## aspirationturbo           -1.463e-01  5.633e-02  -2.598 0.010291 *  
## carbodyhardtop             1.721e-01  8.366e-02   2.057 0.041398 *  
## carbodyhatchback           2.443e-01  7.807e-02   3.129 0.002099 ** 
## carbodysedan               2.064e-01  7.893e-02   2.615 0.009811 ** 
## carbodywagon               2.419e-01  8.584e-02   2.818 0.005468 ** 
## carheight                  2.232e-02  9.233e-03   2.417 0.016812 *  
## carManufactureraudi        1.011e-02  1.483e-01   0.068 0.945728    
## carManufacturerbmw        -1.750e-01  1.403e-01  -1.247 0.214158    
## carManufacturerbuick       1.846e-01  1.631e-01   1.132 0.259428    
## carManufacturerchevrolet   4.549e-01  1.438e-01   3.163 0.001885 ** 
## carManufacturerdodge       3.990e-01  1.231e-01   3.241 0.001460 ** 
## carManufacturerhonda       3.183e-02  1.508e-01   0.211 0.833138    
## carManufacturerisuzu       2.130e-01  1.321e-01   1.612 0.109069    
## carManufacturerjaguar      4.124e-01  1.593e-01   2.589 0.010567 *  
## carManufacturermazda       1.683e-01  1.118e-01   1.505 0.134508    
## carManufacturermercury     2.507e-01  1.828e-01   1.371 0.172261    
## carManufacturermitsubishi  4.643e-01  1.224e-01   3.792 0.000214 ***
## carManufacturernissan      2.542e-01  1.082e-01   2.348 0.020153 *  
## carManufacturerpeugeot     9.500e-02  2.648e-01   0.359 0.720272    
## carManufacturerplymouth    4.305e-01  1.243e-01   3.463 0.000693 ***
## carManufacturerporsche    -2.739e-01  1.799e-01  -1.523 0.129855    
## carManufacturerrenault     3.335e-01  1.554e-01   2.145 0.033503 *  
## carManufacturersaab        6.646e-02  1.243e-01   0.535 0.593687    
## carManufacturersubaru      2.860e-01  1.096e-01   2.609 0.009990 ** 
## carManufacturertoyota      3.154e-01  1.014e-01   3.110 0.002231 ** 
## carManufacturervolkswagen  2.159e-01  1.178e-01   1.833 0.068750 .  
## carManufacturervolvo       2.337e-01  1.297e-01   1.802 0.073455 .  
## compressionratio           6.497e-02  2.599e-02   2.500 0.013483 *  
## curbweight                -7.338e-04  8.952e-05  -8.197 9.22e-14 ***
## cylinderNumfour            5.106e-02  9.722e-02   0.525 0.600196    
## cylinderNumgeq_eight      -2.566e-01  1.576e-01  -1.628 0.105641    
## cylinderNumleq_three      -4.678e-01  1.984e-01  -2.358 0.019628 *  
## cylinderNumsix            -2.902e-02  1.133e-01  -0.256 0.798189    
## carwidth                  -6.919e-03  1.522e-02  -0.455 0.650047    
## enginelocationrear        -2.790e-01  1.892e-01  -1.475 0.142272    
## enginetypedohcv            7.366e-01  2.796e-01   2.634 0.009302 ** 
## enginetypel                3.902e-01  2.326e-01   1.677 0.095499 .  
## enginetypeohc              4.051e-03  7.281e-02   0.056 0.955706    
## enginetypeohcf                    NA         NA      NA       NA    
## enginetypeohcv             6.259e-02  8.599e-02   0.728 0.467835    
## enginetyperotor                   NA         NA      NA       NA    
## enginesize                -1.513e-03  1.476e-03  -1.025 0.306948    
## fuelsystem2bbl            -2.239e-01  1.046e-01  -2.140 0.033939 *  
## fuelsystem4bbl            -1.828e-01  1.847e-01  -0.990 0.323843    
## fuelsystemidi             -9.496e-01  3.803e-01  -2.497 0.013577 *  
## fuelsystemmfi             -3.229e-01  1.846e-01  -1.749 0.082300 .  
## fuelsystemmpfi            -3.043e-01  1.111e-01  -2.738 0.006919 ** 
## fuelsystemspdi            -3.167e-01  1.290e-01  -2.454 0.015237 *  
## fuelsystemspfi            -2.251e-01  1.919e-01  -1.173 0.242699    
## peakrpm                   -1.031e-04  4.493e-05  -2.294 0.023149 *  
## stroke                     7.758e-02  6.606e-02   1.174 0.242070    
## wheelbase                 -1.891e-02  5.421e-03  -3.489 0.000634 ***
## horsepower                -1.002e-03  1.368e-03  -0.732 0.465102    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1294 on 153 degrees of freedom
## Multiple R-squared:  0.9742, Adjusted R-squared:  0.9656 
## F-statistic: 113.4 on 51 and 153 DF,  p-value: < 2.2e-16
lmpg2 <- stepAIC(lmpg1, direction = "both", trace = FALSE)
summary(lmpg2)
## 
## Call:
## lm(formula = log(highwaympg/price) ~ aspiration + carbody + carheight + 
##     carManufacturer + compressionratio + curbweight + enginetype + 
##     enginesize + fuelsystem + peakrpm + stroke + wheelbase, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32911 -0.08305  0.00000  0.07236  0.33359 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -3.476e+00  6.408e-01  -5.424 2.15e-07 ***
## aspirationturbo           -1.806e-01  4.199e-02  -4.302 2.96e-05 ***
## carbodyhardtop             1.830e-01  8.071e-02   2.268 0.024683 *  
## carbodyhatchback           2.378e-01  7.342e-02   3.239 0.001462 ** 
## carbodysedan               1.960e-01  7.504e-02   2.612 0.009882 ** 
## carbodywagon               2.286e-01  8.287e-02   2.759 0.006482 ** 
## carheight                  2.425e-02  8.797e-03   2.757 0.006528 ** 
## carManufactureraudi       -8.900e-02  1.222e-01  -0.728 0.467622    
## carManufacturerbmw        -2.345e-01  1.230e-01  -1.906 0.058460 .  
## carManufacturerbuick       8.776e-02  1.274e-01   0.689 0.492011    
## carManufacturerchevrolet   4.256e-01  1.396e-01   3.048 0.002705 ** 
## carManufacturerdodge       3.665e-01  1.160e-01   3.158 0.001903 ** 
## carManufacturerhonda      -2.311e-02  1.452e-01  -0.159 0.873759    
## carManufacturerisuzu       1.918e-01  1.272e-01   1.507 0.133686    
## carManufacturerjaguar      4.083e-01  1.499e-01   2.724 0.007172 ** 
## carManufacturermazda       1.286e-01  1.067e-01   1.205 0.230020    
## carManufacturermercury     1.827e-01  1.744e-01   1.047 0.296540    
## carManufacturermitsubishi  4.341e-01  1.173e-01   3.700 0.000297 ***
## carManufacturernissan      2.321e-01  1.046e-01   2.218 0.027952 *  
## carManufacturerpeugeot     5.891e-01  2.120e-01   2.778 0.006130 ** 
## carManufacturerplymouth    4.038e-01  1.186e-01   3.403 0.000844 ***
## carManufacturerporsche    -3.034e-01  1.728e-01  -1.756 0.081009 .  
## carManufacturerrenault     3.018e-01  1.450e-01   2.080 0.039095 *  
## carManufacturersaab        2.746e-02  1.192e-01   0.230 0.818051    
## carManufacturersubaru      5.974e-01  1.995e-01   2.995 0.003189 ** 
## carManufacturertoyota      2.768e-01  9.659e-02   2.866 0.004728 ** 
## carManufacturervolkswagen  1.605e-01  1.098e-01   1.461 0.146011    
## carManufacturervolvo       2.006e-01  1.229e-01   1.632 0.104665    
## compressionratio           5.739e-02  2.523e-02   2.274 0.024292 *  
## curbweight                -7.277e-04  7.299e-05  -9.970  < 2e-16 ***
## enginetypedohcv            3.821e-01  1.914e-01   1.997 0.047579 *  
## enginetypel               -1.135e-01  1.780e-01  -0.638 0.524720    
## enginetypeohc              3.254e-02  5.990e-02   0.543 0.587698    
## enginetypeohcf            -3.025e-01  1.743e-01  -1.736 0.084563 .  
## enginetypeohcv             1.010e-02  7.839e-02   0.129 0.897646    
## enginetyperotor           -5.791e-01  1.543e-01  -3.752 0.000246 ***
## enginesize                -3.318e-03  9.547e-04  -3.475 0.000659 ***
## fuelsystem2bbl            -2.395e-01  1.032e-01  -2.321 0.021539 *  
## fuelsystem4bbl            -1.808e-01  1.833e-01  -0.986 0.325467    
## fuelsystemidi             -8.276e-01  3.694e-01  -2.240 0.026468 *  
## fuelsystemmfi             -3.266e-01  1.802e-01  -1.812 0.071828 .  
## fuelsystemmpfi            -3.189e-01  1.073e-01  -2.972 0.003420 ** 
## fuelsystemspdi            -3.381e-01  1.260e-01  -2.682 0.008087 ** 
## fuelsystemspfi            -2.437e-01  1.877e-01  -1.298 0.196132    
## peakrpm                   -1.133e-04  3.885e-05  -2.915 0.004068 ** 
## stroke                     1.073e-01  6.101e-02   1.760 0.080428 .  
## wheelbase                 -1.933e-02  4.659e-03  -4.149 5.43e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1289 on 158 degrees of freedom
## Multiple R-squared:  0.9736, Adjusted R-squared:  0.9659 
## F-statistic: 126.6 on 46 and 158 DF,  p-value: < 2.2e-16

3.3 Price model

lprice1 <- lm(
  log(price) ~ aspiration + carbody + carheight + carManufacturer +
    compressionratio + curbweight + cylinderNum + carwidth + enginelocation +
    enginetype + enginesize + fuelsystem + peakrpm + stroke + wheelbase +
    horsepower,
  data = df
)

print(summary(lprice1))
## 
## Call:
## lm(formula = log(price) ~ aspiration + carbody + carheight + 
##     carManufacturer + compressionratio + curbweight + cylinderNum + 
##     carwidth + enginelocation + enginetype + enginesize + fuelsystem + 
##     peakrpm + stroke + wheelbase + horsepower, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.28750 -0.06471  0.00000  0.06954  0.32894 
## 
## Coefficients: (2 not defined because of singularities)
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                6.821e+00  8.709e-01   7.833 7.44e-13 ***
## aspirationturbo            9.086e-02  4.938e-02   1.840 0.067680 .  
## carbodyhardtop            -1.647e-01  7.333e-02  -2.246 0.026163 *  
## carbodyhatchback          -2.017e-01  6.843e-02  -2.947 0.003715 ** 
## carbodysedan              -1.538e-01  6.919e-02  -2.224 0.027642 *  
## carbodywagon              -1.837e-01  7.525e-02  -2.441 0.015785 *  
## carheight                 -2.542e-02  8.094e-03  -3.140 0.002026 ** 
## carManufactureraudi        3.645e-02  1.300e-01   0.280 0.779589    
## carManufacturerbmw         2.676e-01  1.230e-01   2.176 0.031069 *  
## carManufacturerbuick      -3.809e-02  1.430e-01  -0.266 0.790300    
## carManufacturerchevrolet  -2.532e-01  1.261e-01  -2.008 0.046388 *  
## carManufacturerdodge      -3.067e-01  1.079e-01  -2.843 0.005082 ** 
## carManufacturerhonda      -4.979e-02  1.322e-01  -0.377 0.706965    
## carManufacturerisuzu      -9.446e-02  1.158e-01  -0.815 0.416088    
## carManufacturerjaguar     -2.703e-01  1.396e-01  -1.936 0.054748 .  
## carManufacturermazda      -1.222e-01  9.804e-02  -1.247 0.214357    
## carManufacturermercury    -1.905e-01  1.603e-01  -1.188 0.236484    
## carManufacturermitsubishi -3.784e-01  1.073e-01  -3.526 0.000558 ***
## carManufacturernissan     -1.705e-01  9.488e-02  -1.797 0.074376 .  
## carManufacturerpeugeot    -3.600e-01  2.321e-01  -1.551 0.122974    
## carManufacturerplymouth   -3.274e-01  1.090e-01  -3.004 0.003116 ** 
## carManufacturerporsche     2.564e-01  1.577e-01   1.626 0.106000    
## carManufacturerrenault    -2.932e-01  1.363e-01  -2.152 0.032996 *  
## carManufacturersaab       -7.567e-03  1.090e-01  -0.069 0.944726    
## carManufacturersubaru     -2.891e-01  9.612e-02  -3.008 0.003074 ** 
## carManufacturertoyota     -2.383e-01  8.888e-02  -2.681 0.008137 ** 
## carManufacturervolkswagen -1.539e-01  1.033e-01  -1.491 0.138121    
## carManufacturervolvo      -1.400e-01  1.137e-01  -1.231 0.220043    
## compressionratio          -2.733e-02  2.278e-02  -1.199 0.232244    
## curbweight                 4.902e-04  7.847e-05   6.247 3.95e-09 ***
## cylinderNumfour            7.249e-02  8.522e-02   0.851 0.396291    
## cylinderNumgeq_eight       1.541e-01  1.382e-01   1.115 0.266477    
## cylinderNumleq_three       3.363e-01  1.739e-01   1.934 0.054977 .  
## cylinderNumsix             9.797e-02  9.933e-02   0.986 0.325516    
## carwidth                   2.344e-02  1.334e-02   1.757 0.080970 .  
## enginelocationrear         4.170e-01  1.658e-01   2.514 0.012958 *  
## enginetypedohcv           -2.573e-01  2.451e-01  -1.050 0.295538    
## enginetypel               -3.644e-02  2.039e-01  -0.179 0.858407    
## enginetypeohc              1.990e-03  6.383e-02   0.031 0.975173    
## enginetypeohcf                    NA         NA      NA       NA    
## enginetypeohcv            -6.583e-02  7.538e-02  -0.873 0.383855    
## enginetyperotor                   NA         NA      NA       NA    
## enginesize                 9.166e-04  1.293e-03   0.709 0.479625    
## fuelsystem2bbl             1.133e-01  9.170e-02   1.235 0.218565    
## fuelsystem4bbl            -3.051e-03  1.619e-01  -0.019 0.984994    
## fuelsystemidi              5.265e-01  3.333e-01   1.580 0.116263    
## fuelsystemmfi              1.139e-01  1.618e-01   0.704 0.482450    
## fuelsystemmpfi             1.719e-01  9.742e-02   1.764 0.079664 .  
## fuelsystemspdi             1.608e-01  1.131e-01   1.422 0.157127    
## fuelsystemspfi             1.048e-02  1.682e-01   0.062 0.950399    
## peakrpm                    4.067e-05  3.939e-05   1.033 0.303466    
## stroke                    -5.030e-02  5.791e-02  -0.869 0.386405    
## wheelbase                  1.249e-02  4.752e-03   2.628 0.009453 ** 
## horsepower                 4.508e-04  1.199e-03   0.376 0.707552    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1134 on 153 degrees of freedom
## Multiple R-squared:  0.962,  Adjusted R-squared:  0.9493 
## F-statistic: 75.95 on 51 and 153 DF,  p-value: < 2.2e-16
lprice2 <- stepAIC(lprice1, direction = "both", trace = FALSE)
summary(lprice2)
## 
## Call:
## lm(formula = log(price) ~ aspiration + carbody + carheight + 
##     carManufacturer + curbweight + enginetype + wheelbase + horsepower, 
##     data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.28761 -0.06687  0.00000  0.07021  0.31517 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                8.068e+00  3.711e-01  21.738  < 2e-16 ***
## aspirationturbo            7.751e-02  2.844e-02   2.725 0.007105 ** 
## carbodyhardtop            -2.049e-01  6.971e-02  -2.940 0.003749 ** 
## carbodyhatchback          -2.518e-01  6.264e-02  -4.021 8.75e-05 ***
## carbodysedan              -1.919e-01  6.397e-02  -3.000 0.003106 ** 
## carbodywagon              -2.242e-01  7.024e-02  -3.192 0.001687 ** 
## carheight                 -3.161e-02  7.190e-03  -4.396 1.95e-05 ***
## carManufactureraudi        9.737e-02  9.814e-02   0.992 0.322541    
## carManufacturerbmw         3.404e-01  1.011e-01   3.368 0.000939 ***
## carManufacturerbuick       5.197e-02  1.032e-01   0.504 0.615054    
## carManufacturerchevrolet  -2.444e-01  1.160e-01  -2.107 0.036575 *  
## carManufacturerdodge      -2.764e-01  9.197e-02  -3.006 0.003054 ** 
## carManufacturerhonda      -1.081e-01  8.892e-02  -1.216 0.225719    
## carManufacturerisuzu      -1.227e-01  1.001e-01  -1.225 0.222270    
## carManufacturerjaguar     -2.539e-01  1.150e-01  -2.207 0.028639 *  
## carManufacturermazda      -6.493e-02  8.991e-02  -0.722 0.471184    
## carManufacturermercury    -1.417e-01  1.508e-01  -0.939 0.348823    
## carManufacturermitsubishi -3.280e-01  9.031e-02  -3.632 0.000374 ***
## carManufacturernissan     -1.520e-01  8.494e-02  -1.789 0.075347 .  
## carManufacturerpeugeot    -4.929e-01  1.786e-01  -2.760 0.006415 ** 
## carManufacturerplymouth   -2.955e-01  9.374e-02  -3.152 0.001918 ** 
## carManufacturerporsche     3.614e-01  1.448e-01   2.495 0.013546 *  
## carManufacturerrenault    -1.881e-01  1.158e-01  -1.625 0.106099    
## carManufacturersaab        7.292e-02  9.707e-02   0.751 0.453588    
## carManufacturersubaru     -5.630e-01  1.739e-01  -3.237 0.001454 ** 
## carManufacturertoyota     -2.025e-01  8.270e-02  -2.449 0.015349 *  
## carManufacturervolkswagen -7.308e-02  8.890e-02  -0.822 0.412233    
## carManufacturervolvo      -6.686e-02  9.772e-02  -0.684 0.494799    
## curbweight                 5.014e-04  5.479e-05   9.151  < 2e-16 ***
## enginetypedohcv           -3.109e-01  1.743e-01  -1.784 0.076304 .  
## enginetypel                1.997e-01  1.504e-01   1.327 0.186184    
## enginetypeohc             -5.452e-03  4.769e-02  -0.114 0.909122    
## enginetypeohcf             3.326e-01  1.495e-01   2.224 0.027458 *  
## enginetypeohcv             1.294e-02  5.709e-02   0.227 0.820933    
## enginetyperotor            1.067e-01  8.186e-02   1.304 0.194089    
## wheelbase                  1.822e-02  3.854e-03   4.727 4.79e-06 ***
## horsepower                 2.097e-03  6.280e-04   3.339 0.001034 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1128 on 168 degrees of freedom
## Multiple R-squared:  0.9588, Adjusted R-squared:  0.9499 
## F-statistic: 108.5 on 36 and 168 DF,  p-value: < 2.2e-16

3.4 Harmonising both models

lmpg3 <- lm(
  log(highwaympg / price) ~ aspiration + carbody + carheight + carManufacturer +
    curbweight + enginetype + enginesize + fuelsystem + peakrpm + wheelbase,
  data = df
)

print(summary(lmpg3), digits = 3)
## 
## Call:
## lm(formula = log(highwaympg/price) ~ aspiration + carbody + carheight + 
##     carManufacturer + curbweight + enginetype + enginesize + 
##     fuelsystem + peakrpm + wheelbase, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.2894 -0.0717  0.0000  0.0717  0.3138 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -2.57e+00   5.36e-01   -4.79  3.7e-06 ***
## aspirationturbo           -2.15e-01   3.85e-02   -5.58  9.9e-08 ***
## carbodyhardtop             2.11e-01   8.10e-02    2.61  0.00996 ** 
## carbodyhatchback           2.47e-01   7.41e-02    3.33  0.00107 ** 
## carbodysedan               2.09e-01   7.59e-02    2.75  0.00662 ** 
## carbodywagon               2.43e-01   8.38e-02    2.90  0.00432 ** 
## carheight                  2.20e-02   8.65e-03    2.54  0.01210 *  
## carManufactureraudi       -6.79e-02   1.24e-01   -0.55  0.58378    
## carManufacturerbmw        -2.49e-01   1.24e-01   -2.01  0.04561 *  
## carManufacturerbuick       6.49e-02   1.29e-01    0.50  0.61425    
## carManufacturerchevrolet   4.69e-01   1.40e-01    3.34  0.00104 ** 
## carManufacturerdodge       3.97e-01   1.16e-01    3.44  0.00075 ***
## carManufacturerhonda       4.31e-02   1.43e-01    0.30  0.76290    
## carManufacturerisuzu       2.18e-01   1.28e-01    1.70  0.09182 .  
## carManufacturerjaguar      4.59e-01   1.50e-01    3.06  0.00263 ** 
## carManufacturermazda       1.38e-01   1.08e-01    1.28  0.20352    
## carManufacturermercury     1.75e-01   1.76e-01    1.00  0.31980    
## carManufacturermitsubishi  4.63e-01   1.17e-01    3.97  0.00011 ***
## carManufacturernissan      2.71e-01   1.04e-01    2.61  0.00987 ** 
## carManufacturerpeugeot     5.47e-01   2.14e-01    2.56  0.01147 *  
## carManufacturerplymouth    4.43e-01   1.18e-01    3.75  0.00025 ***
## carManufacturerporsche    -2.66e-01   1.74e-01   -1.53  0.12858    
## carManufacturerrenault     3.79e-01   1.40e-01    2.71  0.00754 ** 
## carManufacturersaab        4.89e-02   1.17e-01    0.42  0.67631    
## carManufacturersubaru      5.86e-01   2.02e-01    2.90  0.00432 ** 
## carManufacturertoyota      3.01e-01   9.76e-02    3.08  0.00240 ** 
## carManufacturervolkswagen  2.09e-01   1.10e-01    1.91  0.05816 .  
## carManufacturervolvo       2.32e-01   1.23e-01    1.90  0.05986 .  
## curbweight                -7.46e-04   7.31e-05  -10.21  < 2e-16 ***
## enginetypedohcv            3.94e-01   1.93e-01    2.04  0.04313 *  
## enginetypel               -1.28e-01   1.81e-01   -0.71  0.48001    
## enginetypeohc              2.24e-02   6.06e-02    0.37  0.71188    
## enginetypeohcf            -3.48e-01   1.76e-01   -1.98  0.04920 *  
## enginetypeohcv            -2.21e-02   7.35e-02   -0.30  0.76449    
## enginetyperotor           -5.34e-01   1.55e-01   -3.43  0.00076 ***
## enginesize                -3.01e-03   9.40e-04   -3.21  0.00162 ** 
## fuelsystem2bbl            -2.39e-01   1.05e-01   -2.28  0.02388 *  
## fuelsystem4bbl            -1.83e-01   1.86e-01   -0.99  0.32532    
## fuelsystemidi             -1.46e-02   1.17e-01   -0.12  0.90078    
## fuelsystemmfi             -3.52e-01   1.81e-01   -1.94  0.05365 .  
## fuelsystemmpfi            -3.22e-01   1.09e-01   -2.97  0.00345 ** 
## fuelsystemspdi            -3.75e-01   1.26e-01   -2.98  0.00332 ** 
## fuelsystemspfi            -2.32e-01   1.90e-01   -1.22  0.22489    
## peakrpm                   -1.19e-04   3.89e-05   -3.05  0.00266 ** 
## wheelbase                 -1.84e-02   4.71e-03   -3.90  0.00014 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.131 on 160 degrees of freedom
## Multiple R-squared:  0.972,  Adjusted R-squared:  0.965 
## F-statistic:  128 on 44 and 160 DF,  p-value: <2e-16
AIC(lmpg3)
## [1] -211.0849
par(mfrow = c(2, 2))
plot(lmpg3)

print(anova(lmpg1,lmpg3))
## Analysis of Variance Table
## 
## Model 1: log(highwaympg/price) ~ aspiration + carbody + carheight + carManufacturer + 
##     compressionratio + curbweight + cylinderNum + carwidth + 
##     enginelocation + enginetype + enginesize + fuelsystem + peakrpm + 
##     stroke + wheelbase + horsepower
## Model 2: log(highwaympg/price) ~ aspiration + carbody + carheight + carManufacturer + 
##     curbweight + enginetype + enginesize + fuelsystem + peakrpm + 
##     wheelbase
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    153 2.5606                           
## 2    160 2.7365 -7  -0.17586 1.5012 0.1708
lprice3 <- lm(
  log(price) ~ aspiration + carbody + carheight + carManufacturer +
    curbweight + enginetype + enginesize + fuelsystem + peakrpm + wheelbase,
  data = df
)

print(summary(lprice3), digits = 3)
## 
## Call:
## lm(formula = log(price) ~ aspiration + carbody + carheight + 
##     carManufacturer + curbweight + enginetype + enginesize + 
##     fuelsystem + peakrpm + wheelbase, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30640 -0.05947  0.00299  0.06435  0.31452 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                7.74e+00   4.63e-01   16.71  < 2e-16 ***
## aspirationturbo            1.22e-01   3.32e-02    3.66  0.00034 ***
## carbodyhardtop            -1.92e-01   7.00e-02   -2.74  0.00686 ** 
## carbodyhatchback          -2.14e-01   6.41e-02   -3.34  0.00103 ** 
## carbodysedan              -1.66e-01   6.56e-02   -2.53  0.01237 *  
## carbodywagon              -1.95e-01   7.24e-02   -2.70  0.00778 ** 
## carheight                 -2.90e-02   7.47e-03   -3.87  0.00016 ***
## carManufactureraudi        5.70e-02   1.07e-01    0.53  0.59397    
## carManufacturerbmw         3.14e-01   1.07e-01    2.93  0.00385 ** 
## carManufacturerbuick      -2.16e-03   1.11e-01   -0.02  0.98451    
## carManufacturerchevrolet  -2.58e-01   1.21e-01   -2.13  0.03483 *  
## carManufacturerdodge      -3.07e-01   9.98e-02   -3.07  0.00248 ** 
## carManufacturerhonda      -3.24e-02   1.23e-01   -0.26  0.79309    
## carManufacturerisuzu      -1.09e-01   1.11e-01   -0.99  0.32579    
## carManufacturerjaguar     -2.96e-01   1.30e-01   -2.28  0.02385 *  
## carManufacturermazda      -7.95e-02   9.32e-02   -0.85  0.39474    
## carManufacturermercury    -1.03e-01   1.52e-01   -0.68  0.49956    
## carManufacturermitsubishi -3.66e-01   1.01e-01   -3.64  0.00037 ***
## carManufacturernissan     -1.69e-01   8.98e-02   -1.88  0.06227 .  
## carManufacturerpeugeot    -5.22e-01   1.85e-01   -2.82  0.00538 ** 
## carManufacturerplymouth   -3.32e-01   1.02e-01   -3.25  0.00141 ** 
## carManufacturerporsche     3.31e-01   1.50e-01    2.21  0.02886 *  
## carManufacturerrenault    -2.77e-01   1.21e-01   -2.29  0.02327 *  
## carManufacturersaab        5.05e-02   1.01e-01    0.50  0.61735    
## carManufacturersubaru     -6.31e-01   1.75e-01   -3.61  0.00041 ***
## carManufacturertoyota     -2.20e-01   8.43e-02   -2.61  0.00994 ** 
## carManufacturervolkswagen -1.27e-01   9.48e-02   -1.34  0.18266    
## carManufacturervolvo      -9.87e-02   1.06e-01   -0.93  0.35222    
## curbweight                 5.09e-04   6.31e-05    8.06  1.7e-13 ***
## enginetypedohcv           -9.95e-02   1.67e-01   -0.60  0.55204    
## enginetypel                1.94e-01   1.56e-01    1.24  0.21637    
## enginetypeohc             -6.27e-03   5.23e-02   -0.12  0.90474    
## enginetypeohcf             3.92e-01   1.52e-01    2.59  0.01062 *  
## enginetypeohcv            -2.50e-03   6.35e-02   -0.04  0.96864    
## enginetyperotor            2.66e-01   1.34e-01    1.98  0.04941 *  
## enginesize                 1.62e-03   8.12e-04    1.99  0.04835 *  
## fuelsystem2bbl             1.30e-01   9.04e-02    1.43  0.15342    
## fuelsystem4bbl             7.17e-03   1.60e-01    0.04  0.96440    
## fuelsystemidi              1.36e-01   1.01e-01    1.35  0.18009    
## fuelsystemmfi              1.51e-01   1.56e-01    0.96  0.33628    
## fuelsystemmpfi             1.88e-01   9.38e-02    2.01  0.04636 *  
## fuelsystemspdi             1.96e-01   1.09e-01    1.80  0.07399 .  
## fuelsystemspfi             4.71e-02   1.64e-01    0.29  0.77478    
## peakrpm                    4.59e-05   3.36e-05    1.36  0.17449    
## wheelbase                  1.60e-02   4.07e-03    3.92  0.00013 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.113 on 160 degrees of freedom
## Multiple R-squared:  0.961,  Adjusted R-squared:  0.95 
## F-statistic: 88.5 on 44 and 160 DF,  p-value: <2e-16
AIC(lprice3)
## [1] -271.0404
par(bg = "white")
par(mfrow = c(2, 2))
plot(lprice3)

par(mfrow = c(1, 1))
vif(lmpg3)
##                         GVIF Df GVIF^(1/(2*Df))
## aspiration      2.625732e+00  1        1.620411
## carbody         8.802230e+00  4        1.312424
## carheight       5.329608e+00  1        2.308594
## carManufacturer 3.822617e+06 21        1.434574
## curbweight      1.726727e+01  1        4.155390
## enginetype      2.026449e+04  6        2.285044
## enginesize      1.827302e+01  1        4.274695
## fuelsystem      9.183312e+02  7        1.627957
## peakrpm         4.111872e+00  1        2.027775
## wheelbase       9.608223e+00  1        3.099713
vif(lprice3)
##                         GVIF Df GVIF^(1/(2*Df))
## aspiration      2.625732e+00  1        1.620411
## carbody         8.802230e+00  4        1.312424
## carheight       5.329608e+00  1        2.308594
## carManufacturer 3.822617e+06 21        1.434574
## curbweight      1.726727e+01  1        4.155390
## enginetype      2.026449e+04  6        2.285044
## enginesize      1.827302e+01  1        4.274695
## fuelsystem      9.183312e+02  7        1.627957
## peakrpm         4.111872e+00  1        2.027775
## wheelbase       9.608223e+00  1        3.099713
summ(lmpg3, digits = 3)
Observations 205
Dependent variable log(highwaympg/price)
Type OLS linear regression
F(44,160) 128.404
0.972
Adj. R² 0.965
Est. S.E. t val. p
(Intercept) -2.569 0.536 -4.794 0.000
aspirationturbo -0.215 0.038 -5.583 0.000
carbodyhardtop 0.211 0.081 2.608 0.010
carbodyhatchback 0.247 0.074 3.332 0.001
carbodysedan 0.209 0.076 2.751 0.007
carbodywagon 0.243 0.084 2.895 0.004
carheight 0.022 0.009 2.538 0.012
carManufactureraudi -0.068 0.124 -0.549 0.584
carManufacturerbmw -0.249 0.124 -2.015 0.046
carManufacturerbuick 0.065 0.129 0.505 0.614
carManufacturerchevrolet 0.469 0.140 3.341 0.001
carManufacturerdodge 0.397 0.116 3.435 0.001
carManufacturerhonda 0.043 0.143 0.302 0.763
carManufacturerisuzu 0.218 0.128 1.696 0.092
carManufacturerjaguar 0.459 0.150 3.056 0.003
carManufacturermazda 0.138 0.108 1.277 0.204
carManufacturermercury 0.175 0.176 0.998 0.320
carManufacturermitsubishi 0.463 0.117 3.972 0.000
carManufacturernissan 0.271 0.104 2.611 0.010
carManufacturerpeugeot 0.547 0.214 2.558 0.011
carManufacturerplymouth 0.443 0.118 3.751 0.000
carManufacturerporsche -0.266 0.174 -1.528 0.129
carManufacturerrenault 0.379 0.140 2.706 0.008
carManufacturersaab 0.049 0.117 0.418 0.676
carManufacturersubaru 0.586 0.202 2.895 0.004
carManufacturertoyota 0.301 0.098 3.084 0.002
carManufacturervolkswagen 0.209 0.110 1.908 0.058
carManufacturervolvo 0.232 0.123 1.895 0.060
curbweight -0.001 0.000 -10.206 0.000
enginetypedohcv 0.394 0.193 2.039 0.043
enginetypel -0.128 0.181 -0.708 0.480
enginetypeohc 0.022 0.061 0.370 0.712
enginetypeohcf -0.348 0.176 -1.982 0.049
enginetypeohcv -0.022 0.073 -0.300 0.764
enginetyperotor -0.534 0.155 -3.432 0.001
enginesize -0.003 0.001 -3.207 0.002
fuelsystem2bbl -0.239 0.105 -2.281 0.024
fuelsystem4bbl -0.183 0.186 -0.987 0.325
fuelsystemidi -0.015 0.117 -0.125 0.901
fuelsystemmfi -0.352 0.181 -1.944 0.054
fuelsystemmpfi -0.322 0.109 -2.968 0.003
fuelsystemspdi -0.375 0.126 -2.981 0.003
fuelsystemspfi -0.232 0.190 -1.218 0.225
peakrpm -0.000 0.000 -3.052 0.003
wheelbase -0.018 0.005 -3.902 0.000
Standard errors: OLS
summ(lprice3, digits = 3)
Observations 205
Dependent variable log(price)
Type OLS linear regression
F(44,160) 88.546
0.961
Adj. R² 0.950
Est. S.E. t val. p
(Intercept) 7.736 0.463 16.706 0.000
aspirationturbo 0.122 0.033 3.660 0.000
carbodyhardtop -0.192 0.070 -2.739 0.007
carbodyhatchback -0.214 0.064 -3.345 0.001
carbodysedan -0.166 0.066 -2.530 0.012
carbodywagon -0.195 0.072 -2.695 0.008
carheight -0.029 0.007 -3.875 0.000
carManufactureraudi 0.057 0.107 0.534 0.594
carManufacturerbmw 0.314 0.107 2.933 0.004
carManufacturerbuick -0.002 0.111 -0.019 0.985
carManufacturerchevrolet -0.258 0.121 -2.128 0.035
carManufacturerdodge -0.307 0.100 -3.074 0.002
carManufacturerhonda -0.032 0.123 -0.263 0.793
carManufacturerisuzu -0.109 0.111 -0.986 0.326
carManufacturerjaguar -0.296 0.130 -2.281 0.024
carManufacturermazda -0.079 0.093 -0.853 0.395
carManufacturermercury -0.103 0.152 -0.677 0.500
carManufacturermitsubishi -0.366 0.101 -3.635 0.000
carManufacturernissan -0.169 0.090 -1.878 0.062
carManufacturerpeugeot -0.522 0.185 -2.822 0.005
carManufacturerplymouth -0.332 0.102 -3.250 0.001
carManufacturerporsche 0.331 0.150 2.205 0.029
carManufacturerrenault -0.277 0.121 -2.291 0.023
carManufacturersaab 0.051 0.101 0.501 0.617
carManufacturersubaru -0.631 0.175 -3.611 0.000
carManufacturertoyota -0.220 0.084 -2.609 0.010
carManufacturervolkswagen -0.127 0.095 -1.338 0.183
carManufacturervolvo -0.099 0.106 -0.933 0.352
curbweight 0.001 0.000 8.058 0.000
enginetypedohcv -0.099 0.167 -0.596 0.552
enginetypel 0.194 0.156 1.241 0.216
enginetypeohc -0.006 0.052 -0.120 0.905
enginetypeohcf 0.392 0.152 2.585 0.011
enginetypeohcv -0.003 0.063 -0.039 0.969
enginetyperotor 0.266 0.134 1.980 0.049
enginesize 0.002 0.001 1.990 0.048
fuelsystem2bbl 0.130 0.090 1.434 0.153
fuelsystem4bbl 0.007 0.160 0.045 0.964
fuelsystemidi 0.136 0.101 1.346 0.180
fuelsystemmfi 0.151 0.156 0.964 0.336
fuelsystemmpfi 0.188 0.094 2.008 0.046
fuelsystemspdi 0.196 0.109 1.798 0.074
fuelsystemspfi 0.047 0.164 0.287 0.775
peakrpm 0.000 0.000 1.364 0.174
wheelbase 0.016 0.004 3.920 0.000
Standard errors: OLS

3.5 Creating custom plots for report

save_diag_side_by_side <- function(
    m_left, m_right,
    left_label = "Model: log(highwaympg/price)",
    right_label = "Model: log(price)",
    out_dir = "fig",
    file = "diag_side_by_side.png",
    width = 8,
    height = 12,
    res = 200,
    pointsize = 10) {
  dir.create(out_dir, recursive = TRUE, showWarnings = FALSE)
  path <- file.path(out_dir, file)

  png(
    filename = path, units = "in", width = width, height = height,
    res = res, pointsize = pointsize, bg = "white"
  )

  op <- par(no.readonly = TRUE)
  on.exit(
    {
      par(op)
      dev.off()
    },
    add = TRUE
  )

  par(bg = "white", mfcol = c(4, 2), mar = c(4, 4, 2, 1), oma = c(2, 2, 4, 0.5))
  plot(m_left, which = c(1, 2, 3, 5), sub.caption = "", ask = FALSE)
  plot(m_right, which = c(1, 2, 3, 5), sub.caption = "", ask = FALSE)

  # Column headers
  # mtext(left_label, side = 3, line = 1.2, at = 0.25, outer = TRUE, font = 2)
  # mtext(right_label, side = 3, line = 1.2, at = 0.75, outer = TRUE, font = 2)
  lab <- function(txt) as.expression(bquote(bold(underline(.(txt)))))

  mtext(lab(left_label),
    side = 3, line = 1.1, at = 0.28,
    outer = TRUE, adj = 0.5
  )

  mtext(lab(right_label),
    side = 3, line = 1.1, at = 0.77,
    outer = TRUE, adj = 0.5
  )

  invisible(path)
}

save_diag_side_by_side(lmpg3, lprice3,
  out_dir = "fig",
  file = "diag_both.png",
  width = 7, height = 10, res = 300
)
Diagnostics for both models

Diagnostics for both models

3.6 Quick check: log(highwaympg)

mdl.f_highwaympg3 <- lm(
  log(highwaympg) ~ aspiration + carbody + carheight + carManufacturer +
    curbweight + enginetype + enginesize + fuelsystem + peakrpm + wheelbase,
  data = df
)

print(summary(mdl.f_highwaympg3), digits = 3)
## 
## Call:
## lm(formula = log(highwaympg) ~ aspiration + carbody + carheight + 
##     carManufacturer + curbweight + enginetype + enginesize + 
##     fuelsystem + peakrpm + wheelbase, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.2201 -0.0281  0.0000  0.0375  0.2392 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                5.17e+00   2.87e-01   18.02  < 2e-16 ***
## aspirationturbo           -9.32e-02   2.06e-02   -4.53  1.2e-05 ***
## carbodyhardtop             1.96e-02   4.33e-02    0.45   0.6517    
## carbodyhatchback           3.28e-02   3.97e-02    0.83   0.4090    
## carbodysedan               4.29e-02   4.06e-02    1.06   0.2922    
## carbodywagon               4.75e-02   4.48e-02    1.06   0.2909    
## carheight                 -7.00e-03   4.63e-03   -1.51   0.1322    
## carManufactureraudi       -1.08e-02   6.61e-02   -0.16   0.8703    
## carManufacturerbmw         6.43e-02   6.62e-02    0.97   0.3329    
## carManufacturerbuick       6.28e-02   6.88e-02    0.91   0.3627    
## carManufacturerchevrolet   2.11e-01   7.51e-02    2.81   0.0056 ** 
## carManufacturerdodge       9.00e-02   6.18e-02    1.46   0.1471    
## carManufacturerhonda       1.07e-02   7.64e-02    0.14   0.8884    
## carManufacturerisuzu       1.08e-01   6.86e-02    1.58   0.1163    
## carManufacturerjaguar      1.63e-01   8.03e-02    2.03   0.0440 *  
## carManufacturermazda       5.82e-02   5.77e-02    1.01   0.3146    
## carManufacturermercury     7.26e-02   9.40e-02    0.77   0.4408    
## carManufacturermitsubishi  9.69e-02   6.24e-02    1.55   0.1223    
## carManufacturernissan      1.03e-01   5.56e-02    1.85   0.0662 .  
## carManufacturerpeugeot     2.56e-02   1.14e-01    0.22   0.8230    
## carManufacturerplymouth    1.12e-01   6.32e-02    1.76   0.0795 .  
## carManufacturerporsche     6.57e-02   9.30e-02    0.71   0.4811    
## carManufacturerrenault     1.02e-01   7.50e-02    1.36   0.1759    
## carManufacturersaab        9.94e-02   6.25e-02    1.59   0.1136    
## carManufacturersubaru     -4.54e-02   1.08e-01   -0.42   0.6753    
## carManufacturertoyota      8.10e-02   5.22e-02    1.55   0.1225    
## carManufacturervolkswagen  8.25e-02   5.87e-02    1.41   0.1617    
## carManufacturervolvo       1.33e-01   6.55e-02    2.04   0.0433 *  
## curbweight                -2.37e-04   3.91e-05   -6.07  9.1e-09 ***
## enginetypedohcv            2.94e-01   1.03e-01    2.85   0.0050 ** 
## enginetypel                6.58e-02   9.65e-02    0.68   0.4967    
## enginetypeohc              1.61e-02   3.24e-02    0.50   0.6190    
## enginetypeohcf             4.42e-02   9.39e-02    0.47   0.6387    
## enginetypeohcv            -2.46e-02   3.93e-02   -0.62   0.5330    
## enginetyperotor           -2.68e-01   8.32e-02   -3.22   0.0016 ** 
## enginesize                -1.40e-03   5.03e-04   -2.78   0.0061 ** 
## fuelsystem2bbl            -1.09e-01   5.60e-02   -1.95   0.0532 .  
## fuelsystem4bbl            -1.76e-01   9.93e-02   -1.77   0.0782 .  
## fuelsystemidi              1.22e-01   6.26e-02    1.94   0.0540 .  
## fuelsystemmfi             -2.01e-01   9.68e-02   -2.08   0.0394 *  
## fuelsystemmpfi            -1.34e-01   5.80e-02   -2.31   0.0223 *  
## fuelsystemspdi            -1.80e-01   6.73e-02   -2.67   0.0084 ** 
## fuelsystemspfi            -1.85e-01   1.02e-01   -1.82   0.0714 .  
## peakrpm                   -7.29e-05   2.08e-05   -3.50   0.0006 ***
## wheelbase                 -2.43e-03   2.52e-03   -0.96   0.3370    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0699 on 160 degrees of freedom
## Multiple R-squared:  0.924,  Adjusted R-squared:  0.903 
## F-statistic: 44.1 on 44 and 160 DF,  p-value: <2e-16
AIC(mdl.f_highwaympg3)
## [1] -467.6721

4 Exploratory work: Pricing Assistant

get_price_prediction_report <- function(model,
                                        aspiration,
                                        carbody,
                                        carheight,
                                        carManufacturer,
                                        curbweight,
                                        enginetype,
                                        enginesize,
                                        fuelsystem,
                                        peakrpm,
                                        wheelbase,
                                        level = 0.95,
                                        smearing_correct = TRUE,
                                        currency = "$",
                                        quiet = FALSE) {
  new_car <- data.frame(
    aspiration = aspiration,
    carbody = carbody,
    carheight = carheight,
    carManufacturer = carManufacturer,
    curbweight = curbweight,
    enginetype = enginetype,
    enginesize = enginesize,
    fuelsystem = fuelsystem,
    peakrpm = peakrpm,
    wheelbase = wheelbase,
    stringsAsFactors = FALSE
  )

  if (!is.null(model$xlevels)) {
    for (nm in names(model$xlevels)) {
      new_car[[nm]] <- factor(new_car[[nm]], levels = model$xlevels[[nm]])
    }
    bad <- vapply(names(model$xlevels), function(nm) any(is.na(new_car[[nm]])), logical(1))
    if (any(bad)) {
      stop(
        "Unseen factor level(s): ",
        paste(sprintf("%s", names(model$xlevels)[bad]), collapse = ", "),
        ". Use choices from model$xlevels."
      )
    }
  }

  pred_log <- predict(model, newdata = new_car, interval = "prediction", level = level)

  smear <- if (smearing_correct) mean(exp(residuals(model))) else 1

  fit_price <- exp(pred_log[1, "fit"]) * smear
  lwr_price <- exp(pred_log[1, "lwr"])
  upr_price <- exp(pred_log[1, "upr"])

  if (!quiet) {
    desc <- sprintf(
      "%s %s | enginetype=%s, fuelsystem=%s, enginesize=%s, curbweight=%s, carheight=%s, peakrpm=%s, wheelbase=%s",
      new_car$carManufacturer, new_car$carbody,
      new_car$enginetype, new_car$fuelsystem, new_car$enginesize,
      new_car$curbweight, new_car$carheight, new_car$peakrpm, new_car$wheelbase
    )
    cat(sprintf(
      "\nBased on the log-price model, for %s:\n  • Point estimate: %s%.0f\n  • %d%% prediction interval: [%s%.0f, %s%.0f]\n",
      desc, currency, fit_price, round(level * 100), currency, lwr_price, currency, upr_price
    ))
  }

  invisible(data.frame(
    predicted_price = fit_price,
    lower = lwr_price,
    upper = upr_price,
    level = level
  ))
}
# Example 1
get_price_prediction_report(
  model = lprice3,
  aspiration = "std",
  carbody = "sedan",
  carheight = 52,
  carManufacturer = "toyota",
  curbweight = 2500,
  enginetype = "ohc",
  enginesize = 120,
  fuelsystem = "mpfi",
  peakrpm = 5000,
  wheelbase = 97
)
## 
## Based on the log-price model, for toyota sedan | enginetype=ohc, fuelsystem=mpfi, enginesize=120, curbweight=2500, carheight=52, peakrpm=5000, wheelbase=97:
##   • Point estimate: $10667
##   • 95% prediction interval: [$8407, $13401]
# Example 2
get_price_prediction_report(
  model = lprice3,
  aspiration = "turbo",
  carbody = "hardtop",
  carheight = 50,
  carManufacturer = "porsche",
  curbweight = 3000,
  enginetype = "dohc",
  enginesize = 180,
  fuelsystem = "mpfi",
  peakrpm = 6000,
  wheelbase = 98
)
## 
## Based on the log-price model, for porsche hardtop | enginetype=dohc, fuelsystem=mpfi, enginesize=180, curbweight=3000, carheight=50, peakrpm=6000, wheelbase=98:
##   • Point estimate: $32839
##   • 95% prediction interval: [$22945, $46540]

4.1 Shiny implementation

This is still a work in progress…

5 Code for appendix

lmpg1 <- lm(formula = log(highwaympg/price) ~ aspiration +
       carbody +
       carheight +
       carManufacturer +
       compressionratio +
       curbweight +
       cylinderNum +
       carwidth +
       enginelocation +
       enginetype +
       enginesize +
       fuelsystem +
       peakrpm +
       stroke +
       wheelbase +
       horsepower,
     data = df)

summary(lmpg1)
## 
## Call:
## lm(formula = log(highwaympg/price) ~ aspiration + carbody + carheight + 
##     carManufacturer + compressionratio + curbweight + cylinderNum + 
##     carwidth + enginelocation + enginetype + enginesize + fuelsystem + 
##     peakrpm + stroke + wheelbase + horsepower, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30655 -0.07567 -0.00100  0.07592  0.32212 
## 
## Coefficients: (2 not defined because of singularities)
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -3.167e+00  9.935e-01  -3.188 0.001738 ** 
## aspirationturbo           -1.463e-01  5.633e-02  -2.598 0.010291 *  
## carbodyhardtop             1.721e-01  8.366e-02   2.057 0.041398 *  
## carbodyhatchback           2.443e-01  7.807e-02   3.129 0.002099 ** 
## carbodysedan               2.064e-01  7.893e-02   2.615 0.009811 ** 
## carbodywagon               2.419e-01  8.584e-02   2.818 0.005468 ** 
## carheight                  2.232e-02  9.233e-03   2.417 0.016812 *  
## carManufactureraudi        1.011e-02  1.483e-01   0.068 0.945728    
## carManufacturerbmw        -1.750e-01  1.403e-01  -1.247 0.214158    
## carManufacturerbuick       1.846e-01  1.631e-01   1.132 0.259428    
## carManufacturerchevrolet   4.549e-01  1.438e-01   3.163 0.001885 ** 
## carManufacturerdodge       3.990e-01  1.231e-01   3.241 0.001460 ** 
## carManufacturerhonda       3.183e-02  1.508e-01   0.211 0.833138    
## carManufacturerisuzu       2.130e-01  1.321e-01   1.612 0.109069    
## carManufacturerjaguar      4.124e-01  1.593e-01   2.589 0.010567 *  
## carManufacturermazda       1.683e-01  1.118e-01   1.505 0.134508    
## carManufacturermercury     2.507e-01  1.828e-01   1.371 0.172261    
## carManufacturermitsubishi  4.643e-01  1.224e-01   3.792 0.000214 ***
## carManufacturernissan      2.542e-01  1.082e-01   2.348 0.020153 *  
## carManufacturerpeugeot     9.500e-02  2.648e-01   0.359 0.720272    
## carManufacturerplymouth    4.305e-01  1.243e-01   3.463 0.000693 ***
## carManufacturerporsche    -2.739e-01  1.799e-01  -1.523 0.129855    
## carManufacturerrenault     3.335e-01  1.554e-01   2.145 0.033503 *  
## carManufacturersaab        6.646e-02  1.243e-01   0.535 0.593687    
## carManufacturersubaru      2.860e-01  1.096e-01   2.609 0.009990 ** 
## carManufacturertoyota      3.154e-01  1.014e-01   3.110 0.002231 ** 
## carManufacturervolkswagen  2.159e-01  1.178e-01   1.833 0.068750 .  
## carManufacturervolvo       2.337e-01  1.297e-01   1.802 0.073455 .  
## compressionratio           6.497e-02  2.599e-02   2.500 0.013483 *  
## curbweight                -7.338e-04  8.952e-05  -8.197 9.22e-14 ***
## cylinderNumfour            5.106e-02  9.722e-02   0.525 0.600196    
## cylinderNumgeq_eight      -2.566e-01  1.576e-01  -1.628 0.105641    
## cylinderNumleq_three      -4.678e-01  1.984e-01  -2.358 0.019628 *  
## cylinderNumsix            -2.902e-02  1.133e-01  -0.256 0.798189    
## carwidth                  -6.919e-03  1.522e-02  -0.455 0.650047    
## enginelocationrear        -2.790e-01  1.892e-01  -1.475 0.142272    
## enginetypedohcv            7.366e-01  2.796e-01   2.634 0.009302 ** 
## enginetypel                3.902e-01  2.326e-01   1.677 0.095499 .  
## enginetypeohc              4.051e-03  7.281e-02   0.056 0.955706    
## enginetypeohcf                    NA         NA      NA       NA    
## enginetypeohcv             6.259e-02  8.599e-02   0.728 0.467835    
## enginetyperotor                   NA         NA      NA       NA    
## enginesize                -1.513e-03  1.476e-03  -1.025 0.306948    
## fuelsystem2bbl            -2.239e-01  1.046e-01  -2.140 0.033939 *  
## fuelsystem4bbl            -1.828e-01  1.847e-01  -0.990 0.323843    
## fuelsystemidi             -9.496e-01  3.803e-01  -2.497 0.013577 *  
## fuelsystemmfi             -3.229e-01  1.846e-01  -1.749 0.082300 .  
## fuelsystemmpfi            -3.043e-01  1.111e-01  -2.738 0.006919 ** 
## fuelsystemspdi            -3.167e-01  1.290e-01  -2.454 0.015237 *  
## fuelsystemspfi            -2.251e-01  1.919e-01  -1.173 0.242699    
## peakrpm                   -1.031e-04  4.493e-05  -2.294 0.023149 *  
## stroke                     7.758e-02  6.606e-02   1.174 0.242070    
## wheelbase                 -1.891e-02  5.421e-03  -3.489 0.000634 ***
## horsepower                -1.002e-03  1.368e-03  -0.732 0.465102    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1294 on 153 degrees of freedom
## Multiple R-squared:  0.9742, Adjusted R-squared:  0.9656 
## F-statistic: 113.4 on 51 and 153 DF,  p-value: < 2.2e-16
print(AIC(lmpg1))
## [1] -210.7019
stepAIC(
  lmpg1,
  scope = list(lower = ~ carManufacturer,
               upper = ~ .),
  direction = 'both',
  trace = 1
)
## Start:  AIC=-794.47
## log(highwaympg/price) ~ aspiration + carbody + carheight + carManufacturer + 
##     compressionratio + curbweight + cylinderNum + carwidth + 
##     enginelocation + enginetype + enginesize + fuelsystem + peakrpm + 
##     stroke + wheelbase + horsepower
## 
## 
## Step:  AIC=-794.47
## log(highwaympg/price) ~ aspiration + carbody + carheight + carManufacturer + 
##     compressionratio + curbweight + cylinderNum + carwidth + 
##     enginetype + enginesize + fuelsystem + peakrpm + stroke + 
##     wheelbase + horsepower
## 
##                    Df Sum of Sq    RSS     AIC
## - cylinderNum       3   0.05024 2.6108 -796.48
## - carwidth          1   0.00346 2.5641 -796.19
## - horsepower        1   0.00898 2.5696 -795.75
## - enginesize        1   0.01759 2.5782 -795.06
## - stroke            1   0.02308 2.5837 -794.63
## <none>                          2.5606 -794.47
## - peakrpm           1   0.08807 2.6487 -789.53
## - fuelsystem        7   0.25503 2.8156 -789.00
## - carheight         1   0.09779 2.6584 -788.78
## - compressionratio  1   0.10458 2.6652 -788.26
## - aspiration        1   0.11297 2.6736 -787.62
## - carbody           4   0.19743 2.7580 -787.24
## - wheelbase         1   0.20370 2.7643 -780.78
## - enginetype        5   0.36170 2.9223 -777.38
## - curbweight        1   1.12458 3.6852 -721.83
## 
## Step:  AIC=-796.48
## log(highwaympg/price) ~ aspiration + carbody + carheight + carManufacturer + 
##     compressionratio + curbweight + carwidth + enginetype + enginesize + 
##     fuelsystem + peakrpm + stroke + wheelbase + horsepower
## 
##                    Df Sum of Sq    RSS     AIC
## - carwidth          1   0.00444 2.6153 -798.14
## - horsepower        1   0.00780 2.6186 -797.87
## <none>                          2.6108 -796.48
## + cylinderNum       3   0.05024 2.5606 -794.47
## - stroke            1   0.05198 2.6628 -794.44
## - enginesize        1   0.07428 2.6851 -792.73
## - fuelsystem        7   0.23871 2.8496 -792.55
## - peakrpm           1   0.08586 2.6967 -791.85
## - compressionratio  1   0.09105 2.7019 -791.46
## - carheight         1   0.09777 2.7086 -790.95
## - aspiration        1   0.12746 2.7383 -788.71
## - carbody           4   0.22035 2.8312 -787.87
## - wheelbase         1   0.20031 2.8112 -783.33
## - enginetype        6   0.61442 3.2253 -765.16
## - curbweight        1   1.21710 3.8280 -720.04
## 
## Step:  AIC=-798.14
## log(highwaympg/price) ~ aspiration + carbody + carheight + carManufacturer + 
##     compressionratio + curbweight + enginetype + enginesize + 
##     fuelsystem + peakrpm + stroke + wheelbase + horsepower
## 
##                    Df Sum of Sq    RSS     AIC
## - horsepower        1   0.00921 2.6245 -799.41
## <none>                          2.6153 -798.14
## + carwidth          1   0.00444 2.6108 -796.48
## - stroke            1   0.05064 2.6659 -796.20
## + cylinderNum       3   0.05122 2.5641 -796.19
## - fuelsystem        7   0.23428 2.8496 -794.55
## - enginesize        1   0.07438 2.6897 -794.39
## - peakrpm           1   0.08588 2.7012 -793.51
## - compressionratio  1   0.08687 2.7022 -793.44
## - carheight         1   0.11791 2.7332 -791.09
## - aspiration        1   0.13017 2.7455 -790.18
## - carbody           4   0.22040 2.8357 -789.55
## - wheelbase         1   0.29518 2.9105 -778.21
## - enginetype        6   0.65119 3.2665 -764.56
## - curbweight        1   1.36715 3.9824 -713.93
## 
## Step:  AIC=-799.41
## log(highwaympg/price) ~ aspiration + carbody + carheight + carManufacturer + 
##     compressionratio + curbweight + enginetype + enginesize + 
##     fuelsystem + peakrpm + stroke + wheelbase
## 
##                    Df Sum of Sq    RSS     AIC
## <none>                          2.6245 -799.41
## + horsepower        1   0.00921 2.6153 -798.14
## + carwidth          1   0.00586 2.6186 -797.87
## - stroke            1   0.05142 2.6759 -797.44
## + cylinderNum       3   0.05033 2.5742 -797.38
## - compressionratio  1   0.08592 2.7104 -794.81
## - fuelsystem        7   0.26708 2.8916 -793.55
## - carheight         1   0.12622 2.7507 -791.78
## - carbody           4   0.21120 2.8357 -791.55
## - peakrpm           1   0.14119 2.7657 -790.67
## - enginesize        1   0.20062 2.8251 -786.31
## - wheelbase         1   0.28599 2.9105 -780.21
## - aspiration        1   0.30741 2.9319 -778.71
## - enginetype        6   0.64395 3.2685 -766.43
## - curbweight        1   1.65111 4.2756 -701.37
## 
## Call:
## lm(formula = log(highwaympg/price) ~ aspiration + carbody + carheight + 
##     carManufacturer + compressionratio + curbweight + enginetype + 
##     enginesize + fuelsystem + peakrpm + stroke + wheelbase, data = df)
## 
## Coefficients:
##               (Intercept)            aspirationturbo             carbodyhardtop           carbodyhatchback  
##                -3.4757348                 -0.1806168                  0.1830444                  0.2378186  
##              carbodysedan               carbodywagon                  carheight        carManufactureraudi  
##                 0.1959691                  0.2286335                  0.0242511                 -0.0890008  
##        carManufacturerbmw       carManufacturerbuick   carManufacturerchevrolet       carManufacturerdodge  
##                -0.2344905                  0.0877627                  0.4255598                  0.3664775  
##      carManufacturerhonda       carManufacturerisuzu      carManufacturerjaguar       carManufacturermazda  
##                -0.0231124                  0.1918243                  0.4082793                  0.1285910  
##    carManufacturermercury  carManufacturermitsubishi      carManufacturernissan     carManufacturerpeugeot  
##                 0.1826551                  0.4341002                  0.2320999                  0.5890716  
##   carManufacturerplymouth     carManufacturerporsche     carManufacturerrenault        carManufacturersaab  
##                 0.4037851                 -0.3034015                  0.3017632                  0.0274612  
##     carManufacturersubaru      carManufacturertoyota  carManufacturervolkswagen       carManufacturervolvo  
##                 0.5973606                  0.2768099                  0.1604704                  0.2005878  
##          compressionratio                 curbweight            enginetypedohcv                enginetypel  
##                 0.0573865                 -0.0007277                  0.3821056                 -0.1134962  
##             enginetypeohc             enginetypeohcf             enginetypeohcv            enginetyperotor  
##                 0.0325425                 -0.3025278                  0.0101005                 -0.5791131  
##                enginesize             fuelsystem2bbl             fuelsystem4bbl              fuelsystemidi  
##                -0.0033177                 -0.2394617                 -0.1807830                 -0.8276278  
##             fuelsystemmfi             fuelsystemmpfi             fuelsystemspdi             fuelsystemspfi  
##                -0.3265925                 -0.3189251                 -0.3380763                 -0.2436716  
##                   peakrpm                     stroke                  wheelbase  
##                -0.0001133                  0.1073458                 -0.0193328
lmpg2 <- lm(formula = log(highwaympg/price) ~ aspiration + carbody +
                  carheight +
       carManufacturer + compressionratio + curbweight + enginetype +
       enginesize + fuelsystem + peakrpm + stroke + wheelbase, data = df)

summary(lmpg2) #R^2: .9659
## 
## Call:
## lm(formula = log(highwaympg/price) ~ aspiration + carbody + carheight + 
##     carManufacturer + compressionratio + curbweight + enginetype + 
##     enginesize + fuelsystem + peakrpm + stroke + wheelbase, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32911 -0.08305  0.00000  0.07236  0.33359 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -3.476e+00  6.408e-01  -5.424 2.15e-07 ***
## aspirationturbo           -1.806e-01  4.199e-02  -4.302 2.96e-05 ***
## carbodyhardtop             1.830e-01  8.071e-02   2.268 0.024683 *  
## carbodyhatchback           2.378e-01  7.342e-02   3.239 0.001462 ** 
## carbodysedan               1.960e-01  7.504e-02   2.612 0.009882 ** 
## carbodywagon               2.286e-01  8.287e-02   2.759 0.006482 ** 
## carheight                  2.425e-02  8.797e-03   2.757 0.006528 ** 
## carManufactureraudi       -8.900e-02  1.222e-01  -0.728 0.467622    
## carManufacturerbmw        -2.345e-01  1.230e-01  -1.906 0.058460 .  
## carManufacturerbuick       8.776e-02  1.274e-01   0.689 0.492011    
## carManufacturerchevrolet   4.256e-01  1.396e-01   3.048 0.002705 ** 
## carManufacturerdodge       3.665e-01  1.160e-01   3.158 0.001903 ** 
## carManufacturerhonda      -2.311e-02  1.452e-01  -0.159 0.873759    
## carManufacturerisuzu       1.918e-01  1.272e-01   1.507 0.133686    
## carManufacturerjaguar      4.083e-01  1.499e-01   2.724 0.007172 ** 
## carManufacturermazda       1.286e-01  1.067e-01   1.205 0.230020    
## carManufacturermercury     1.827e-01  1.744e-01   1.047 0.296540    
## carManufacturermitsubishi  4.341e-01  1.173e-01   3.700 0.000297 ***
## carManufacturernissan      2.321e-01  1.046e-01   2.218 0.027952 *  
## carManufacturerpeugeot     5.891e-01  2.120e-01   2.778 0.006130 ** 
## carManufacturerplymouth    4.038e-01  1.186e-01   3.403 0.000844 ***
## carManufacturerporsche    -3.034e-01  1.728e-01  -1.756 0.081009 .  
## carManufacturerrenault     3.018e-01  1.450e-01   2.080 0.039095 *  
## carManufacturersaab        2.746e-02  1.192e-01   0.230 0.818051    
## carManufacturersubaru      5.974e-01  1.995e-01   2.995 0.003189 ** 
## carManufacturertoyota      2.768e-01  9.659e-02   2.866 0.004728 ** 
## carManufacturervolkswagen  1.605e-01  1.098e-01   1.461 0.146011    
## carManufacturervolvo       2.006e-01  1.229e-01   1.632 0.104665    
## compressionratio           5.739e-02  2.523e-02   2.274 0.024292 *  
## curbweight                -7.277e-04  7.299e-05  -9.970  < 2e-16 ***
## enginetypedohcv            3.821e-01  1.914e-01   1.997 0.047579 *  
## enginetypel               -1.135e-01  1.780e-01  -0.638 0.524720    
## enginetypeohc              3.254e-02  5.990e-02   0.543 0.587698    
## enginetypeohcf            -3.025e-01  1.743e-01  -1.736 0.084563 .  
## enginetypeohcv             1.010e-02  7.839e-02   0.129 0.897646    
## enginetyperotor           -5.791e-01  1.543e-01  -3.752 0.000246 ***
## enginesize                -3.318e-03  9.547e-04  -3.475 0.000659 ***
## fuelsystem2bbl            -2.395e-01  1.032e-01  -2.321 0.021539 *  
## fuelsystem4bbl            -1.808e-01  1.833e-01  -0.986 0.325467    
## fuelsystemidi             -8.276e-01  3.694e-01  -2.240 0.026468 *  
## fuelsystemmfi             -3.266e-01  1.802e-01  -1.812 0.071828 .  
## fuelsystemmpfi            -3.189e-01  1.073e-01  -2.972 0.003420 ** 
## fuelsystemspdi            -3.381e-01  1.260e-01  -2.682 0.008087 ** 
## fuelsystemspfi            -2.437e-01  1.877e-01  -1.298 0.196132    
## peakrpm                   -1.133e-04  3.885e-05  -2.915 0.004068 ** 
## stroke                     1.073e-01  6.101e-02   1.760 0.080428 .  
## wheelbase                 -1.933e-02  4.659e-03  -4.149 5.43e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1289 on 158 degrees of freedom
## Multiple R-squared:  0.9736, Adjusted R-squared:  0.9659 
## F-statistic: 126.6 on 46 and 158 DF,  p-value: < 2.2e-16
print(AIC(lmpg2))
## [1] -215.6496
lmpg3 <- lm(formula = log(highwaympg/price) ~ aspiration + carbody +
                wheelbase +
                carheight + curbweight +
                enginetype + enginesize +
                fuelsystem + peakrpm +
                carManufacturer, data = df)

summary(lmpg3) #R^2: .9649
## 
## Call:
## lm(formula = log(highwaympg/price) ~ aspiration + carbody + wheelbase + 
##     carheight + curbweight + enginetype + enginesize + fuelsystem + 
##     peakrpm + carManufacturer, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.28940 -0.07170  0.00000  0.07169  0.31380 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -2.569e+00  5.360e-01  -4.794 3.71e-06 ***
## aspirationturbo           -2.149e-01  3.848e-02  -5.583 9.90e-08 ***
## carbodyhardtop             2.114e-01  8.105e-02   2.608 0.009965 ** 
## carbodyhatchback           2.471e-01  7.414e-02   3.332 0.001070 ** 
## carbodysedan               2.088e-01  7.591e-02   2.751 0.006624 ** 
## carbodywagon               2.428e-01  8.385e-02   2.895 0.004317 ** 
## wheelbase                 -1.839e-02  4.713e-03  -3.902 0.000140 ***
## carheight                  2.196e-02  8.651e-03   2.538 0.012096 *  
## curbweight                -7.458e-04  7.307e-05 -10.206  < 2e-16 ***
## enginetypedohcv            3.939e-01  1.932e-01   2.039 0.043128 *  
## enginetypel               -1.278e-01  1.805e-01  -0.708 0.480008    
## enginetypeohc              2.242e-02  6.059e-02   0.370 0.711876    
## enginetypeohcf            -3.479e-01  1.755e-01  -1.982 0.049200 *  
## enginetypeohcv            -2.205e-02  7.349e-02  -0.300 0.764490    
## enginetyperotor           -5.336e-01  1.555e-01  -3.432 0.000762 ***
## enginesize                -3.014e-03  9.399e-04  -3.207 0.001622 ** 
## fuelsystem2bbl            -2.387e-01  1.046e-01  -2.281 0.023880 *  
## fuelsystem4bbl            -1.831e-01  1.856e-01  -0.987 0.325317    
## fuelsystemidi             -1.462e-02  1.170e-01  -0.125 0.900778    
## fuelsystemmfi             -3.520e-01  1.811e-01  -1.944 0.053653 .  
## fuelsystemmpfi            -3.221e-01  1.085e-01  -2.968 0.003454 ** 
## fuelsystemspdi            -3.751e-01  1.258e-01  -2.981 0.003322 ** 
## fuelsystemspfi            -2.318e-01  1.903e-01  -1.218 0.224886    
## peakrpm                   -1.188e-04  3.893e-05  -3.052 0.002663 ** 
## carManufactureraudi       -6.785e-02  1.236e-01  -0.549 0.583781    
## carManufacturerbmw        -2.493e-01  1.238e-01  -2.015 0.045607 *  
## carManufacturerbuick       6.494e-02  1.286e-01   0.505 0.614254    
## carManufacturerchevrolet   4.692e-01  1.405e-01   3.341 0.001040 ** 
## carManufacturerdodge       3.969e-01  1.156e-01   3.435 0.000754 ***
## carManufacturerhonda       4.315e-02  1.428e-01   0.302 0.762904    
## carManufacturerisuzu       2.176e-01  1.283e-01   1.696 0.091816 .  
## carManufacturerjaguar      4.587e-01  1.501e-01   3.056 0.002625 ** 
## carManufacturermazda       1.377e-01  1.078e-01   1.277 0.203517    
## carManufacturermercury     1.754e-01  1.757e-01   0.998 0.319796    
## carManufacturermitsubishi  4.633e-01  1.167e-01   3.972 0.000108 ***
## carManufacturernissan      2.714e-01  1.039e-01   2.611 0.009872 ** 
## carManufacturerpeugeot     5.474e-01  2.140e-01   2.558 0.011468 *  
## carManufacturerplymouth    4.432e-01  1.182e-01   3.751 0.000245 ***
## carManufacturerporsche    -2.657e-01  1.739e-01  -1.528 0.128576    
## carManufacturerrenault     3.794e-01  1.402e-01   2.706 0.007540 ** 
## carManufacturersaab        4.888e-02  1.169e-01   0.418 0.676313    
## carManufacturersubaru      5.856e-01  2.023e-01   2.895 0.004319 ** 
## carManufacturertoyota      3.009e-01  9.756e-02   3.084 0.002404 ** 
## carManufacturervolkswagen  2.093e-01  1.097e-01   1.908 0.058157 .  
## carManufacturervolvo       2.322e-01  1.225e-01   1.895 0.059862 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1308 on 160 degrees of freedom
## Multiple R-squared:  0.9725, Adjusted R-squared:  0.9649 
## F-statistic: 128.4 on 44 and 160 DF,  p-value: < 2.2e-16
print(AIC(lmpg3))
## [1] -211.0849
lprice1 <- lm(formula = log(price) ~ aspiration + carbody + carheight +
       carManufacturer + compressionratio + curbweight + enginetype +
       enginesize + fuelsystem + peakrpm + stroke + wheelbase + horsepower,
     data = df)

summary(lprice1)
## 
## Call:
## lm(formula = log(price) ~ aspiration + carbody + carheight + 
##     carManufacturer + compressionratio + curbweight + enginetype + 
##     enginesize + fuelsystem + peakrpm + stroke + wheelbase + 
##     horsepower, data = df)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.315430 -0.063392  0.000257  0.061065  0.307662 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                8.032e+00  5.652e-01  14.210  < 2e-16 ***
## aspirationturbo            1.000e-01  4.852e-02   2.061 0.040950 *  
## carbodyhardtop            -1.853e-01  7.188e-02  -2.578 0.010849 *  
## carbodyhatchback          -2.158e-01  6.651e-02  -3.245 0.001435 ** 
## carbodysedan              -1.670e-01  6.784e-02  -2.461 0.014927 *  
## carbodywagon              -1.972e-01  7.442e-02  -2.649 0.008891 ** 
## carheight                 -2.889e-02  7.796e-03  -3.705 0.000292 ***
## carManufactureraudi        7.160e-02  1.086e-01   0.659 0.510583    
## carManufacturerbmw         3.067e-01  1.086e-01   2.826 0.005334 ** 
## carManufacturerbuick       3.648e-03  1.189e-01   0.031 0.975552    
## carManufacturerchevrolet  -2.378e-01  1.233e-01  -1.928 0.055607 .  
## carManufacturerdodge      -2.843e-01  1.036e-01  -2.743 0.006787 ** 
## carManufacturerhonda      -4.361e-04  1.281e-01  -0.003 0.997289    
## carManufacturerisuzu      -9.361e-02  1.126e-01  -0.832 0.406896    
## carManufacturerjaguar     -2.663e-01  1.375e-01  -1.937 0.054529 .  
## carManufacturermazda      -7.100e-02  9.432e-02  -0.753 0.452730    
## carManufacturermercury    -1.221e-01  1.561e-01  -0.782 0.435448    
## carManufacturermitsubishi -3.448e-01  1.044e-01  -3.303 0.001185 ** 
## carManufacturernissan     -1.478e-01  9.254e-02  -1.597 0.112345    
## carManufacturerpeugeot    -5.275e-01  1.892e-01  -2.788 0.005964 ** 
## carManufacturerplymouth   -3.068e-01  1.058e-01  -2.898 0.004290 ** 
## carManufacturerporsche     3.413e-01  1.523e-01   2.241 0.026409 *  
## carManufacturerrenault    -2.280e-01  1.322e-01  -1.725 0.086503 .  
## carManufacturersaab        5.065e-02  1.050e-01   0.482 0.630357    
## carManufacturersubaru     -6.155e-01  1.829e-01  -3.365 0.000963 ***
## carManufacturertoyota     -2.085e-01  8.543e-02  -2.441 0.015777 *  
## carManufacturervolkswagen -1.032e-01  9.768e-02  -1.056 0.292520    
## carManufacturervolvo      -8.711e-02  1.092e-01  -0.797 0.426436    
## compressionratio          -1.672e-02  2.224e-02  -0.752 0.453241    
## curbweight                 4.949e-04  6.871e-05   7.203 2.31e-11 ***
## enginetypedohcv           -1.464e-01  2.014e-01  -0.727 0.468406    
## enginetypel                1.900e-01  1.569e-01   1.211 0.227667    
## enginetypeohc             -5.881e-03  5.334e-02  -0.110 0.912348    
## enginetypeohcf             3.583e-01  1.577e-01   2.272 0.024446 *  
## enginetypeohcv            -1.833e-02  6.935e-02  -0.264 0.791867    
## enginetyperotor            2.673e-01  1.413e-01   1.892 0.060302 .  
## enginesize                 1.463e-03  1.121e-03   1.305 0.193926    
## fuelsystem2bbl             1.268e-01  9.134e-02   1.388 0.167154    
## fuelsystem4bbl             1.358e-02  1.618e-01   0.084 0.933240    
## fuelsystemidi              3.825e-01  3.262e-01   1.173 0.242713    
## fuelsystemmfi              1.432e-01  1.593e-01   0.899 0.369913    
## fuelsystemmpfi             1.809e-01  9.671e-02   1.871 0.063228 .  
## fuelsystemspdi             1.819e-01  1.118e-01   1.627 0.105761    
## fuelsystemspfi             4.941e-02  1.654e-01   0.299 0.765575    
## peakrpm                    3.526e-05  3.829e-05   0.921 0.358584    
## stroke                    -4.846e-02  5.378e-02  -0.901 0.368901    
## wheelbase                  1.657e-02  4.166e-03   3.978 0.000106 ***
## horsepower                 4.965e-04  1.186e-03   0.419 0.675943    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1136 on 157 degrees of freedom
## Multiple R-squared:  0.9609, Adjusted R-squared:  0.9492 
## F-statistic: 82.04 on 47 and 157 DF,  p-value: < 2.2e-16
print(AIC(lprice1))
## [1] -266.7262
stepAIC(
  lm(formula = log(price) ~ aspiration +
       carbody +
       carheight +
       carManufacturer +
       compressionratio +
       curbweight +
       cylinderNum +
       carwidth +
       enginelocation +
       enginetype +
       enginesize +
       fuelsystem +
       peakrpm +
       stroke +
       wheelbase,
     data = df),
  scope = list(lower = ~ carManufacturer,
               upper = ~ .),
  direction = 'both',
  trace = 1
)
## Start:  AIC=-850.28
## log(price) ~ aspiration + carbody + carheight + carManufacturer + 
##     compressionratio + curbweight + cylinderNum + carwidth + 
##     enginelocation + enginetype + enginesize + fuelsystem + peakrpm + 
##     stroke + wheelbase
## 
## 
## Step:  AIC=-850.28
## log(price) ~ aspiration + carbody + carheight + carManufacturer + 
##     compressionratio + curbweight + cylinderNum + carwidth + 
##     enginetype + enginesize + fuelsystem + peakrpm + stroke + 
##     wheelbase
## 
##                    Df Sum of Sq    RSS     AIC
## - cylinderNum       3   0.02352 1.9929 -853.85
## - stroke            1   0.00997 1.9794 -851.25
## - fuelsystem        7   0.13238 2.1018 -850.95
## - enginesize        1   0.01740 1.9868 -850.48
## - compressionratio  1   0.01839 1.9878 -850.38
## <none>                          1.9694 -850.28
## - peakrpm           1   0.02265 1.9921 -849.94
## - carwidth          1   0.04158 2.0110 -848.00
## - enginetype        5   0.16737 2.1368 -843.56
## - wheelbase         1   0.08722 2.0566 -843.40
## - aspiration        1   0.09413 2.0635 -842.71
## - carbody           4   0.15901 2.1284 -842.37
## - carheight         1   0.12939 2.0988 -839.24
## - curbweight        1   0.54600 2.5154 -802.12
## 
## Step:  AIC=-853.85
## log(price) ~ aspiration + carbody + carheight + carManufacturer + 
##     compressionratio + curbweight + carwidth + enginetype + enginesize + 
##     fuelsystem + peakrpm + stroke + wheelbase
## 
##                    Df Sum of Sq    RSS     AIC
## - fuelsystem        7   0.12848 2.1214 -855.04
## - stroke            1   0.01249 2.0054 -854.57
## - compressionratio  1   0.01325 2.0062 -854.49
## - peakrpm           1   0.01701 2.0099 -854.11
## <none>                          1.9929 -853.85
## - carwidth          1   0.03517 2.0281 -852.26
## - enginesize        1   0.04838 2.0413 -850.93
## + cylinderNum       3   0.02352 1.9694 -850.28
## - wheelbase         1   0.09765 2.0906 -846.04
## - aspiration        1   0.09879 2.0917 -845.93
## - carheight         1   0.12863 2.1216 -843.03
## - carbody           4   0.19246 2.1854 -842.95
## - enginetype        6   0.31435 2.3073 -835.82
## - curbweight        1   0.60694 2.5999 -801.35
## 
## Step:  AIC=-855.04
## log(price) ~ aspiration + carbody + carheight + carManufacturer + 
##     compressionratio + curbweight + carwidth + enginetype + enginesize + 
##     peakrpm + stroke + wheelbase
## 
##                    Df Sum of Sq    RSS     AIC
## - stroke            1   0.00257 2.1240 -856.79
## - compressionratio  1   0.01465 2.1361 -855.63
## <none>                          2.1214 -855.04
## - carwidth          1   0.02288 2.1443 -854.84
## - peakrpm           1   0.03214 2.1536 -853.96
## + fuelsystem        7   0.12848 1.9929 -853.85
## - enginesize        1   0.05237 2.1738 -852.04
## + cylinderNum       3   0.01963 2.1018 -850.95
## - wheelbase         1   0.13580 2.2572 -844.32
## - carbody           4   0.25848 2.3799 -839.47
## - carheight         1   0.19419 2.3156 -839.09
## - enginetype        6   0.32395 2.4453 -837.91
## - aspiration        1   0.25998 2.3814 -833.34
## - curbweight        1   0.83298 2.9544 -789.14
## 
## Step:  AIC=-856.79
## log(price) ~ aspiration + carbody + carheight + carManufacturer + 
##     compressionratio + curbweight + carwidth + enginetype + enginesize + 
##     peakrpm + wheelbase
## 
##                    Df Sum of Sq    RSS     AIC
## - compressionratio  1   0.01613 2.1401 -857.24
## <none>                          2.1240 -856.79
## - carwidth          1   0.02302 2.1470 -856.58
## - peakrpm           1   0.03558 2.1596 -855.39
## + stroke            1   0.00257 2.1214 -855.04
## + fuelsystem        7   0.11857 2.0054 -854.57
## - enginesize        1   0.04988 2.1739 -854.03
## + cylinderNum       3   0.01991 2.1041 -852.72
## - wheelbase         1   0.13431 2.2583 -846.22
## - carbody           4   0.26365 2.3876 -840.80
## - carheight         1   0.20187 2.3258 -840.18
## - enginetype        6   0.32399 2.4480 -839.69
## - aspiration        1   0.26014 2.3841 -835.11
## - curbweight        1   0.83215 2.9561 -791.02
## 
## Step:  AIC=-857.24
## log(price) ~ aspiration + carbody + carheight + carManufacturer + 
##     curbweight + carwidth + enginetype + enginesize + peakrpm + 
##     wheelbase
## 
##                    Df Sum of Sq    RSS     AIC
## <none>                          2.1401 -857.24
## - carwidth          1   0.02229 2.1624 -857.12
## + compressionratio  1   0.01613 2.1240 -856.79
## + fuelsystem        7   0.12648 2.0136 -855.73
## + stroke            1   0.00406 2.1361 -855.63
## + cylinderNum       3   0.02436 2.1157 -853.59
## - enginesize        1   0.06240 2.2025 -853.35
## - peakrpm           1   0.07123 2.2113 -852.53
## - wheelbase         1   0.13296 2.2731 -846.88
## - carbody           4   0.25979 2.3999 -841.75
## - carheight         1   0.19393 2.3340 -841.46
## - enginetype        6   0.31538 2.4555 -841.06
## - aspiration        1   0.24524 2.3854 -837.00
## - curbweight        1   0.82304 2.9632 -792.54
## 
## Call:
## lm(formula = log(price) ~ aspiration + carbody + carheight + 
##     carManufacturer + curbweight + carwidth + enginetype + enginesize + 
##     peakrpm + wheelbase, data = df)
## 
## Coefficients:
##               (Intercept)            aspirationturbo             carbodyhardtop           carbodyhatchback  
##                 6.884e+00                  1.266e-01                 -1.987e-01                 -2.396e-01  
##              carbodysedan               carbodywagon                  carheight        carManufactureraudi  
##                -1.815e-01                 -2.105e-01                 -2.949e-02                  5.042e-03  
##        carManufacturerbmw       carManufacturerbuick   carManufacturerchevrolet       carManufacturerdodge  
##                 3.080e-01                 -9.624e-02                 -2.884e-01                 -3.316e-01  
##      carManufacturerhonda       carManufacturerisuzu      carManufacturerjaguar       carManufacturermazda  
##                -1.888e-01                 -1.548e-01                 -3.551e-01                 -1.185e-01  
##    carManufacturermercury  carManufacturermitsubishi      carManufacturernissan     carManufacturerpeugeot  
##                -1.253e-01                 -3.862e-01                 -2.018e-01                 -6.200e-01  
##   carManufacturerplymouth     carManufacturerporsche     carManufacturerrenault        carManufacturersaab  
##                -3.529e-01                  2.799e-01                 -2.903e-01                  4.008e-02  
##     carManufacturersubaru      carManufacturertoyota  carManufacturervolkswagen       carManufacturervolvo  
##                -6.625e-01                 -2.284e-01                 -1.433e-01                 -1.332e-01  
##                curbweight                   carwidth            enginetypedohcv                enginetypel  
##                 5.036e-04                  1.684e-02                 -1.697e-01                  2.539e-01  
##             enginetypeohc             enginetypeohcf             enginetypeohcv            enginetyperotor  
##                -6.270e-03                  4.056e-01                  1.405e-03                  1.571e-01  
##                enginesize                    peakrpm                  wheelbase  
##                 1.728e-03                  7.001e-05                  1.445e-02
lprice2 <- lm(formula = log(price) ~ aspiration + carbody + carheight +
       carManufacturer + curbweight + carwidth + enginetype + enginesize +
       peakrpm + wheelbase, data = df)

summary(lprice2)
## 
## Call:
## lm(formula = log(price) ~ aspiration + carbody + carheight + 
##     carManufacturer + curbweight + carwidth + enginetype + enginesize + 
##     peakrpm + wheelbase, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27071 -0.06932  0.00373  0.06864  0.32793 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                6.884e+00  8.108e-01   8.491 1.10e-14 ***
## aspirationturbo            1.266e-01  2.902e-02   4.361 2.26e-05 ***
## carbodyhardtop            -1.987e-01  7.012e-02  -2.834 0.005163 ** 
## carbodyhatchback          -2.396e-01  6.332e-02  -3.784 0.000215 ***
## carbodysedan              -1.815e-01  6.481e-02  -2.800 0.005710 ** 
## carbodywagon              -2.105e-01  7.198e-02  -2.924 0.003934 ** 
## carheight                 -2.949e-02  7.603e-03  -3.878 0.000151 ***
## carManufactureraudi        5.042e-03  1.109e-01   0.045 0.963783    
## carManufacturerbmw         3.080e-01  1.069e-01   2.882 0.004475 ** 
## carManufacturerbuick      -9.624e-02  1.089e-01  -0.884 0.378217    
## carManufacturerchevrolet  -2.884e-01  1.205e-01  -2.395 0.017754 *  
## carManufacturerdodge      -3.316e-01  9.839e-02  -3.370 0.000935 ***
## carManufacturerhonda      -1.888e-01  9.903e-02  -1.906 0.058320 .  
## carManufacturerisuzu      -1.548e-01  1.042e-01  -1.485 0.139533    
## carManufacturerjaguar     -3.551e-01  1.257e-01  -2.826 0.005298 ** 
## carManufacturermazda      -1.185e-01  9.372e-02  -1.264 0.207975    
## carManufacturermercury    -1.253e-01  1.517e-01  -0.826 0.409942    
## carManufacturermitsubishi -3.862e-01  9.635e-02  -4.009 9.21e-05 ***
## carManufacturernissan     -2.018e-01  8.919e-02  -2.262 0.024967 *  
## carManufacturerpeugeot    -6.200e-01  1.851e-01  -3.350 0.001000 ***
## carManufacturerplymouth   -3.529e-01  1.000e-01  -3.528 0.000542 ***
## carManufacturerporsche     2.799e-01  1.531e-01   1.828 0.069341 .  
## carManufacturerrenault    -2.903e-01  1.223e-01  -2.373 0.018768 *  
## carManufacturersaab        4.008e-02  1.020e-01   0.393 0.695001    
## carManufacturersubaru     -6.625e-01  1.776e-01  -3.731 0.000262 ***
## carManufacturertoyota     -2.284e-01  8.459e-02  -2.700 0.007648 ** 
## carManufacturervolkswagen -1.433e-01  9.503e-02  -1.508 0.133491    
## carManufacturervolvo      -1.332e-01  1.061e-01  -1.256 0.210996    
## curbweight                 5.036e-04  6.303e-05   7.990 2.16e-13 ***
## carwidth                   1.684e-02  1.281e-02   1.315 0.190364    
## enginetypedohcv           -1.697e-01  1.706e-01  -0.995 0.321331    
## enginetypel                2.539e-01  1.556e-01   1.631 0.104696    
## enginetypeohc             -6.270e-03  5.212e-02  -0.120 0.904380    
## enginetypeohcf             4.056e-01  1.532e-01   2.648 0.008881 ** 
## enginetypeohcv             1.405e-03  6.349e-02   0.022 0.982375    
## enginetyperotor            1.571e-01  9.224e-02   1.704 0.090338 .  
## enginesize                 1.728e-03  7.853e-04   2.200 0.029183 *  
## peakrpm                    7.001e-05  2.979e-05   2.351 0.019922 *  
## wheelbase                  1.445e-02  4.501e-03   3.211 0.001586 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1135 on 166 degrees of freedom
## Multiple R-squared:  0.9587, Adjusted R-squared:  0.9492 
## F-statistic: 101.3 on 38 and 166 DF,  p-value: < 2.2e-16
print(AIC(lprice2))
## [1] -273.476
lprice3 <- lm(formula = log(price) ~ aspiration + carbody +
                wheelbase +
                carheight + curbweight +
                enginetype + enginesize +
                fuelsystem + peakrpm +
                carManufacturer, data = df)

summary(lprice3)
## 
## Call:
## lm(formula = log(price) ~ aspiration + carbody + wheelbase + 
##     carheight + curbweight + enginetype + enginesize + fuelsystem + 
##     peakrpm + carManufacturer, data = df)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.306403 -0.059466  0.002988  0.064348  0.314524 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                7.736e+00  4.631e-01  16.706  < 2e-16 ***
## aspirationturbo            1.217e-01  3.325e-02   3.660 0.000342 ***
## carbodyhardtop            -1.918e-01  7.002e-02  -2.739 0.006864 ** 
## carbodyhatchback          -2.142e-01  6.406e-02  -3.345 0.001026 ** 
## carbodysedan              -1.659e-01  6.558e-02  -2.530 0.012366 *  
## carbodywagon              -1.953e-01  7.244e-02  -2.695 0.007783 ** 
## wheelbase                  1.596e-02  4.072e-03   3.920 0.000131 ***
## carheight                 -2.896e-02  7.474e-03  -3.875 0.000155 ***
## curbweight                 5.087e-04  6.313e-05   8.058 1.69e-13 ***
## enginetypedohcv           -9.949e-02  1.669e-01  -0.596 0.552042    
## enginetypel                1.936e-01  1.560e-01   1.241 0.216369    
## enginetypeohc             -6.275e-03  5.235e-02  -0.120 0.904743    
## enginetypeohcf             3.921e-01  1.517e-01   2.585 0.010623 *  
## enginetypeohcv            -2.500e-03  6.349e-02  -0.039 0.968637    
## enginetyperotor            2.660e-01  1.343e-01   1.980 0.049415 *  
## enginesize                 1.616e-03  8.120e-04   1.990 0.048348 *  
## fuelsystem2bbl             1.297e-01  9.041e-02   1.434 0.153423    
## fuelsystem4bbl             7.168e-03  1.603e-01   0.045 0.964398    
## fuelsystemidi              1.361e-01  1.011e-01   1.346 0.180086    
## fuelsystemmfi              1.509e-01  1.564e-01   0.964 0.336281    
## fuelsystemmpfi             1.882e-01  9.376e-02   2.008 0.046361 *  
## fuelsystemspdi             1.955e-01  1.087e-01   1.798 0.073994 .  
## fuelsystemspfi             4.712e-02  1.644e-01   0.287 0.774782    
## peakrpm                    4.587e-05  3.363e-05   1.364 0.174493    
## carManufactureraudi        5.704e-02  1.068e-01   0.534 0.593966    
## carManufacturerbmw         3.136e-01  1.069e-01   2.933 0.003847 ** 
## carManufacturerbuick      -2.161e-03  1.111e-01  -0.019 0.984506    
## carManufacturerchevrolet  -2.583e-01  1.213e-01  -2.128 0.034828 *  
## carManufacturerdodge      -3.069e-01  9.983e-02  -3.074 0.002483 ** 
## carManufacturerhonda      -3.241e-02  1.234e-01  -0.263 0.793085    
## carManufacturerisuzu      -1.093e-01  1.108e-01  -0.986 0.325790    
## carManufacturerjaguar     -2.958e-01  1.297e-01  -2.281 0.023854 *  
## carManufacturermazda      -7.949e-02  9.315e-02  -0.853 0.394737    
## carManufacturermercury    -1.027e-01  1.518e-01  -0.677 0.499565    
## carManufacturermitsubishi -3.664e-01  1.008e-01  -3.635 0.000374 ***
## carManufacturernissan     -1.686e-01  8.979e-02  -1.878 0.062269 .  
## carManufacturerpeugeot    -5.217e-01  1.849e-01  -2.822 0.005383 ** 
## carManufacturerplymouth   -3.317e-01  1.021e-01  -3.250 0.001409 ** 
## carManufacturerporsche     3.314e-01  1.503e-01   2.205 0.028857 *  
## carManufacturerrenault    -2.774e-01  1.211e-01  -2.291 0.023266 *  
## carManufacturersaab        5.054e-02  1.010e-01   0.501 0.617355    
## carManufacturersubaru     -6.310e-01  1.747e-01  -3.611 0.000408 ***
## carManufacturertoyota     -2.199e-01  8.429e-02  -2.609 0.009944 ** 
## carManufacturervolkswagen -1.268e-01  9.476e-02  -1.338 0.182663    
## carManufacturervolvo      -9.875e-02  1.058e-01  -0.933 0.352221    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.113 on 160 degrees of freedom
## Multiple R-squared:  0.9606, Adjusted R-squared:  0.9497 
## F-statistic: 88.55 on 44 and 160 DF,  p-value: < 2.2e-16
print(AIC(lprice3))
## [1] -271.0404

6 Working / Old code

6.1 Pricing Assistant Model

Our main goal is to get an accurate predicted price \[\hat{y}\] for a new car.

BE note: A good pricing assistant chimes with goals of assessment and the lecture content.

6.1.1 Attempt 01

Excluding predictors that are highly correlated (parameter choice)

# Y = price
# X = horsepower, enginesize, curbweight, highwaympg,
#     fueltype, aspiration, carbody, drivewheel, cylinderNum

mdl.pa1 <- lm(
  price ~ horsepower + enginesize + curbweight + highwaympg +
    fueltype + aspiration + carbody + drivewheel + cylinderNum,
  data = df
)

print(summary(mdl.pa1))
## 
## Call:
## lm(formula = price ~ horsepower + enginesize + curbweight + highwaympg + 
##     fueltype + aspiration + carbody + drivewheel + cylinderNum, 
##     data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8084.9 -1197.2   -81.7  1316.3 14226.0 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -765.673   5565.870  -0.138  0.89073    
## horsepower              44.333     13.657   3.246  0.00138 ** 
## enginesize              56.908     18.424   3.089  0.00231 ** 
## curbweight               3.238      1.518   2.133  0.03420 *  
## highwaympg              46.589     74.310   0.627  0.53145    
## fueltypegas           -629.559   1227.468  -0.513  0.60863    
## aspirationturbo        400.001    825.850   0.484  0.62870    
## carbodyhardtop       -2067.166   1699.423  -1.216  0.22536    
## carbodyhatchback     -4742.402   1360.316  -3.486  0.00061 ***
## carbodysedan         -3091.102   1336.340  -2.313  0.02180 *  
## carbodywagon         -4489.628   1498.218  -2.997  0.00310 ** 
## drivewheelfwd           26.116   1141.418   0.023  0.98177    
## drivewheelrwd         1711.372   1229.935   1.391  0.16574    
## cylinderNumfour      -4860.555   1113.372  -4.366 2.09e-05 ***
## cylinderNumgeq_eight  2254.206   2250.742   1.002  0.31785    
## cylinderNumleq_three  -655.617   2039.925  -0.321  0.74827    
## cylinderNumsix       -2068.122   1369.065  -1.511  0.13257    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3053 on 188 degrees of freedom
## Multiple R-squared:  0.8654, Adjusted R-squared:  0.8539 
## F-statistic: 75.54 on 16 and 188 DF,  p-value: < 2.2e-16

F-statistic: 75.54 on 16 and 188 DF, p-value: < 2.2e-16

  • Test of null hypothesis that \(H_0: \beta_1 = \beta_2 = \dots = 0\)
  • Meaning all predictors are useless
  • F-value is large and p-value small so we can reject this null hypothesis

Adjusted R-squared: 0.8539

BE comments: Model accounts for ~85% of variability in the variables (Adj. R-squared = 85.4%, F-statistic p-value < 2.2e-16).

Quite a few predictors have high p-values (not statistically significant)

Next step to use Backward Selection process to run algorithm removing the variable with the largest p-value and re-fitting the model…

mdl.pa2 <- stepAIC(mdl.pa1, direction = "backward", trace = TRUE)
## Start:  AIC=3306.08
## price ~ horsepower + enginesize + curbweight + highwaympg + fueltype + 
##     aspiration + carbody + drivewheel + cylinderNum
## 
##               Df Sum of Sq        RSS    AIC
## - aspiration   1   2186978 1754787582 3304.3
## - fueltype     1   2452323 1755052928 3304.4
## - highwaympg   1   3664316 1756264920 3304.5
## <none>                     1752600604 3306.1
## - drivewheel   2  53740715 1806341319 3308.3
## - curbweight   1  42424115 1795024719 3309.0
## - enginesize   1  88938316 1841538921 3314.2
## - horsepower   1  98234746 1850835351 3315.3
## - carbody      4 193349724 1945950328 3319.5
## - cylinderNum  4 327945047 2080545651 3333.2
## 
## Step:  AIC=3304.33
## price ~ horsepower + enginesize + curbweight + highwaympg + fueltype + 
##     carbody + drivewheel + cylinderNum
## 
##               Df Sum of Sq        RSS    AIC
## - highwaympg   1   2819841 1757607423 3302.7
## - fueltype     1   7240661 1762028243 3303.2
## <none>                     1754787582 3304.3
## - drivewheel   2  51554337 1806341919 3306.3
## - curbweight   1  44899211 1799686793 3307.5
## - enginesize   1  87087320 1841874902 3312.3
## - carbody      4 191244297 1946031880 3317.5
## - horsepower   1 141840989 1896628571 3318.3
## - cylinderNum  4 340069822 2094857404 3332.6
## 
## Step:  AIC=3302.66
## price ~ horsepower + enginesize + curbweight + fueltype + carbody + 
##     drivewheel + cylinderNum
## 
##               Df Sum of Sq        RSS    AIC
## <none>                     1757607423 3302.7
## - fueltype     1  18328613 1775936037 3302.8
## - drivewheel   2  51993718 1809601142 3304.6
## - curbweight   1  47386509 1804993933 3306.1
## - enginesize   1  90937348 1848544771 3311.0
## - carbody      4 188713756 1946321179 3315.6
## - horsepower   1 140987873 1898595296 3316.5
## - cylinderNum  4 337658895 2095266319 3330.7

6.1.1.1 Attempt 01: Model comparison

See if variables removed as a group were significant.

Hypothesis:

  • \(H_0\): The simpler model (mdl.pa2) is sufficient
  • \(H_1\): The first (full) model (mdl.pa1) is significantly better.
print(anova(mdl.pa2, mdl.pa1))
## Analysis of Variance Table
## 
## Model 1: price ~ horsepower + enginesize + curbweight + fueltype + carbody + 
##     drivewheel + cylinderNum
## Model 2: price ~ horsepower + enginesize + curbweight + highwaympg + fueltype + 
##     aspiration + carbody + drivewheel + cylinderNum
##   Res.Df        RSS Df Sum of Sq      F Pr(>F)
## 1    190 1757607423                           
## 2    188 1752600604  2   5006819 0.2685 0.7648
par(bg = "white")
par(mfrow = c(2, 2))
plot(mdl.pa2)

par(mfrow = c(1, 1))
vif(mdl.pa2)
##                  GVIF Df GVIF^(1/(2*Df))
## horsepower   4.717089  1        2.171886
## enginesize  12.163868  1        3.487674
## curbweight   9.581354  1        3.095376
## fueltype     1.529787  1        1.236846
## carbody      1.980039  4        1.089141
## drivewheel   2.796677  2        1.293185
## cylinderNum  7.050859  4        1.276528

Brief BE summary:

  • VIF output is good - all values are below the 5 threshold, so collinearity is not a major issue in this model

Plots show two issues

  • Heteroscedasticity in residuals vs fitted (have funnel shape as the residuals get larger as the predicted price increases)
  • Some leverage points / large outliers

6.1.2 Attempt 02

To fix heteroscedasticity let’s use log(price) & look at what happens if we include carManufacturer?

mdl.pa3 <- lm(
  log(price) ~ horsepower + enginesize + curbweight + highwaympg +
    fueltype + aspiration + carbody + drivewheel + cylinderNum +
    carManufacturer,
  data = df
)
summary(mdl.pa3)
## 
## Call:
## lm(formula = log(price) ~ horsepower + enginesize + curbweight + 
##     highwaympg + fueltype + aspiration + carbody + drivewheel + 
##     cylinderNum + carManufacturer, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.38320 -0.06125  0.00149  0.07107  0.39165 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                7.921e+00  2.881e-01  27.498  < 2e-16 ***
## horsepower                 1.845e-03  9.248e-04   1.995 0.047619 *  
## enginesize                 1.392e-03  9.460e-04   1.471 0.143096    
## curbweight                 4.859e-04  7.902e-05   6.149 5.56e-09 ***
## highwaympg                -2.788e-03  3.509e-03  -0.795 0.427919    
## fueltypegas               -1.764e-02  6.014e-02  -0.293 0.769692    
## aspirationturbo            7.776e-02  4.274e-02   1.819 0.070675 .  
## carbodyhardtop            -1.651e-01  7.622e-02  -2.166 0.031741 *  
## carbodyhatchback          -2.283e-01  6.453e-02  -3.539 0.000521 ***
## carbodysedan              -1.645e-01  6.479e-02  -2.538 0.012051 *  
## carbodywagon              -2.453e-01  7.022e-02  -3.493 0.000611 ***
## drivewheelfwd              4.103e-02  5.471e-02   0.750 0.454396    
## drivewheelrwd              7.213e-02  6.621e-02   1.089 0.277540    
## cylinderNumfour            1.106e-01  8.426e-02   1.313 0.191139    
## cylinderNumgeq_eight      -3.306e-02  1.046e-01  -0.316 0.752281    
## cylinderNumleq_three       3.267e-01  1.145e-01   2.854 0.004861 ** 
## cylinderNumsix             1.219e-01  8.638e-02   1.411 0.160179    
## carManufactureraudi        2.936e-01  1.206e-01   2.435 0.015924 *  
## carManufacturerbmw         3.465e-01  9.778e-02   3.543 0.000513 ***
## carManufacturerbuick       2.318e-01  1.365e-01   1.697 0.091480 .  
## carManufacturerchevrolet  -1.765e-01  1.241e-01  -1.422 0.156774    
## carManufacturerdodge      -1.828e-01  9.899e-02  -1.847 0.066510 .  
## carManufacturerhonda      -7.288e-02  9.518e-02  -0.766 0.444941    
## carManufacturerisuzu      -6.875e-02  1.073e-01  -0.641 0.522474    
## carManufacturerjaguar     -1.046e-01  1.345e-01  -0.778 0.437904    
## carManufacturermazda      -3.609e-02  9.384e-02  -0.385 0.701016    
## carManufacturermercury    -8.873e-02  1.551e-01  -0.572 0.568014    
## carManufacturermitsubishi -2.249e-01  9.760e-02  -2.304 0.022457 *  
## carManufacturernissan     -1.415e-01  9.224e-02  -1.534 0.126826    
## carManufacturerpeugeot    -1.686e-01  1.065e-01  -1.583 0.115363    
## carManufacturerplymouth   -2.120e-01  1.004e-01  -2.112 0.036137 *  
## carManufacturerporsche     4.521e-01  1.084e-01   4.170 4.88e-05 ***
## carManufacturerrenault    -1.369e-01  1.282e-01  -1.068 0.286926    
## carManufacturersaab        7.133e-02  1.082e-01   0.659 0.510726    
## carManufacturersubaru     -1.630e-01  9.795e-02  -1.664 0.098036 .  
## carManufacturertoyota     -1.511e-01  8.732e-02  -1.730 0.085431 .  
## carManufacturervolkswagen -4.404e-02  9.448e-02  -0.466 0.641742    
## carManufacturervolvo       9.085e-03  9.783e-02   0.093 0.926122    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1248 on 167 degrees of freedom
## Multiple R-squared:  0.9498, Adjusted R-squared:  0.9386 
## F-statistic: 85.34 on 37 and 167 DF,  p-value: < 2.2e-16
par(bg = "white")
par(mfrow = c(2, 2))
plot(mdl.pa3)

par(mfrow = c(1, 1))
mdl.pa4 <- stepAIC(mdl.pa3, direction = "backward", trace = TRUE)
## Start:  AIC=-819.25
## log(price) ~ horsepower + enginesize + curbweight + highwaympg + 
##     fueltype + aspiration + carbody + drivewheel + cylinderNum + 
##     carManufacturer
## 
##                   Df Sum of Sq    RSS     AIC
## - drivewheel       2   0.01873 2.6198 -821.78
## - fueltype         1   0.00134 2.6024 -821.15
## - highwaympg       1   0.00984 2.6109 -820.48
## <none>                         2.6011 -819.25
## - enginesize       1   0.03372 2.6348 -818.61
## - aspiration       1   0.05155 2.6526 -817.23
## - horsepower       1   0.06202 2.6631 -816.42
## - cylinderNum      4   0.15826 2.7593 -815.14
## - carbody          4   0.33738 2.9385 -802.25
## - curbweight       1   0.58891 3.1900 -779.41
## - carManufacturer 21   2.77342 5.3745 -712.48
## 
## Step:  AIC=-821.78
## log(price) ~ horsepower + enginesize + curbweight + highwaympg + 
##     fueltype + aspiration + carbody + cylinderNum + carManufacturer
## 
##                   Df Sum of Sq    RSS     AIC
## - fueltype         1   0.00495 2.6248 -823.39
## - highwaympg       1   0.01111 2.6309 -822.91
## <none>                         2.6198 -821.78
## - enginesize       1   0.03664 2.6565 -820.93
## - aspiration       1   0.03803 2.6578 -820.83
## - horsepower       1   0.10127 2.7211 -816.01
## - cylinderNum      4   0.19570 2.8155 -815.01
## - carbody          4   0.34719 2.9670 -804.27
## - curbweight       1   0.61217 3.2320 -780.73
## - carManufacturer 21   2.92664 5.5465 -710.02
## 
## Step:  AIC=-823.39
## log(price) ~ horsepower + enginesize + curbweight + highwaympg + 
##     aspiration + carbody + cylinderNum + carManufacturer
## 
##                   Df Sum of Sq    RSS     AIC
## - highwaympg       1   0.00642 2.6312 -824.89
## <none>                         2.6248 -823.39
## - enginesize       1   0.04331 2.6681 -822.04
## - aspiration       1   0.08273 2.7075 -819.03
## - horsepower       1   0.10214 2.7269 -817.57
## - cylinderNum      4   0.21732 2.8421 -815.09
## - carbody          4   0.35725 2.9820 -805.23
## - curbweight       1   0.78807 3.4128 -771.57
## - carManufacturer 21   3.07072 5.6955 -706.58
## 
## Step:  AIC=-824.89
## log(price) ~ horsepower + enginesize + curbweight + aspiration + 
##     carbody + cylinderNum + carManufacturer
## 
##                   Df Sum of Sq    RSS     AIC
## <none>                         2.6312 -824.89
## - enginesize       1   0.04007 2.6713 -823.79
## - aspiration       1   0.07675 2.7079 -821.00
## - cylinderNum      4   0.23362 2.8648 -815.45
## - horsepower       1   0.17210 2.8033 -813.90
## - carbody          4   0.36112 2.9923 -806.53
## - curbweight       1   0.90130 3.5325 -766.51
## - carManufacturer 21   3.06450 5.6957 -708.58
summary(mdl.pa4)
## 
## Call:
## lm(formula = log(price) ~ horsepower + enginesize + curbweight + 
##     aspiration + carbody + cylinderNum + carManufacturer, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35131 -0.06436  0.00549  0.06560  0.41121 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                7.785e+00  1.636e-01  47.580  < 2e-16 ***
## horsepower                 2.234e-03  6.681e-04   3.344 0.001013 ** 
## enginesize                 1.447e-03  8.969e-04   1.614 0.108415    
## curbweight                 5.068e-04  6.622e-05   7.653 1.37e-12 ***
## aspirationturbo            7.176e-02  3.213e-02   2.233 0.026821 *  
## carbodyhardtop            -1.666e-01  7.531e-02  -2.213 0.028256 *  
## carbodyhatchback          -2.359e-01  6.357e-02  -3.710 0.000280 ***
## carbodysedan              -1.730e-01  6.330e-02  -2.734 0.006922 ** 
## carbodywagon              -2.583e-01  6.863e-02  -3.763 0.000230 ***
## cylinderNumfour            1.188e-01  8.188e-02   1.451 0.148706    
## cylinderNumgeq_eight      -6.697e-02  1.010e-01  -0.663 0.508069    
## cylinderNumleq_three       3.681e-01  1.092e-01   3.370 0.000928 ***
## cylinderNumsix             1.245e-01  8.534e-02   1.458 0.146543    
## carManufactureraudi        2.699e-01  1.142e-01   2.364 0.019184 *  
## carManufacturerbmw         3.455e-01  9.617e-02   3.592 0.000428 ***
## carManufacturerbuick       2.512e-01  1.316e-01   1.909 0.058000 .  
## carManufacturerchevrolet  -2.239e-01  1.160e-01  -1.931 0.055195 .  
## carManufacturerdodge      -2.013e-01  9.465e-02  -2.127 0.034895 *  
## carManufacturerhonda      -9.350e-02  9.116e-02  -1.026 0.306487    
## carManufacturerisuzu      -7.672e-02  1.051e-01  -0.730 0.466217    
## carManufacturerjaguar     -1.273e-01  1.302e-01  -0.978 0.329502    
## carManufacturermazda      -4.852e-02  9.119e-02  -0.532 0.595326    
## carManufacturermercury    -9.729e-02  1.538e-01  -0.633 0.527732    
## carManufacturermitsubishi -2.475e-01  9.162e-02  -2.702 0.007593 ** 
## carManufacturernissan     -1.631e-01  8.862e-02  -1.841 0.067378 .  
## carManufacturerpeugeot    -1.536e-01  1.032e-01  -1.488 0.138565    
## carManufacturerplymouth   -2.281e-01  9.692e-02  -2.353 0.019755 *  
## carManufacturerporsche     4.238e-01  1.026e-01   4.130 5.65e-05 ***
## carManufacturerrenault    -1.577e-01  1.237e-01  -1.275 0.204105    
## carManufacturersaab        4.221e-02  9.970e-02   0.423 0.672556    
## carManufacturersubaru     -1.925e-01  9.212e-02  -2.090 0.038133 *  
## carManufacturertoyota     -1.648e-01  8.551e-02  -1.927 0.055623 .  
## carManufacturervolkswagen -6.172e-02  9.032e-02  -0.683 0.495316    
## carManufacturervolvo       1.254e-02  9.489e-02   0.132 0.894990    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.124 on 171 degrees of freedom
## Multiple R-squared:  0.9492, Adjusted R-squared:  0.9394 
## F-statistic: 96.79 on 33 and 171 DF,  p-value: < 2.2e-16
par(bg = "white")
par(mfrow = c(2, 2))
plot(mdl.pa4)

par(mfrow = c(1, 1))
print(vif(mdl.pa4))
##                       GVIF Df GVIF^(1/(2*Df))
## horsepower        9.254549  1        3.042129
## enginesize       18.493058  1        4.300356
## curbweight       15.763028  1        3.970268
## aspiration        2.034585  1        1.426389
## carbody           4.036524  4        1.190559
## cylinderNum      71.722557  4        1.705913
## carManufacturer 383.938806 21        1.152206
print(anova(mdl.pa3, mdl.pa4))
## Analysis of Variance Table
## 
## Model 1: log(price) ~ horsepower + enginesize + curbweight + highwaympg + 
##     fueltype + aspiration + carbody + drivewheel + cylinderNum + 
##     carManufacturer
## Model 2: log(price) ~ horsepower + enginesize + curbweight + aspiration + 
##     carbody + cylinderNum + carManufacturer
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    167 2.6011                           
## 2    171 2.6312 -4 -0.030097 0.4831 0.7481

6.1.2.1 Summary

  • mdl.pa4 now shows a random horizontal band of points. The red line is flat. This indicates the variance of the residuals is now constant, and we can trust our p-values and standard errors.
  • GVIF values are high, but we have lots of categories. Importantly all GVIF^(1/(2*Df)) are below 5 -> no collinearity in model
  • Anova test with two models fails to reject null hypothesis (\(H_0\)​) that all the variables you removed (highwaympg, fueltype, and drivewheel) have coefficients equal to zero and add no significant predictive value to the model.
    • p-value (0.7481) is large -> there is no statistical evidence that highwaympg, fueltype, and drivewheel as a group improve the model.

mdl.pa4 (simpler) is just as good as mdl.pa3, so we should use mdl.pa4.

6.1.3 Demo: testing this model’s use to answer our initial question

BE note: we know from data source that these prices are in dollars…

# Example properties of new car
# BE note: have to use exact text for factor vals
# Also a limitation of this (not built in) is that it doesn't check for limits
# of data (i.e., if we put in a horsepower over the max value the model won't
# be valid, but this isn't accounted for here...)
# this also doesn't account for types of engine produced by companies
new_car <- data.frame(
  horsepower = 100,
  enginesize = 120,
  curbweight = 2500,
  aspiration = as.factor("std"),
  carbody = as.factor("sedan"),
  cylinderNum = as.factor("four"),
  carManufacturer = as.factor("toyota") # peugeot/volvo/porsche
)

# use predict() to estimate fit
pred_log_scale <-
  predict(mdl.pa4, newdata = new_car, interval = "prediction")

print("Prediction on log($) scale:")
## [1] "Prediction on log($) scale:"
print(pred_log_scale)
##        fit      lwr      upr
## 1 9.230509 8.979491 9.481526
# convert back from log price for fit, lwr, upr
pred_dollar_scale <- exp(pred_log_scale)

print("Prediction on dollar ($) scale:")
## [1] "Prediction on dollar ($) scale:"
print(pred_dollar_scale)
##        fit      lwr      upr
## 1 10203.73 7938.589 13115.19
# All in one...
car_props <- paste(
  new_car$carManufacturer, new_car$carbody,
  "with", new_car$horsepower, "hp,",
  new_car$enginesize, "enginesize, and",
  new_car$curbweight, "lb curbweight"
)

fit_price <- pred_dollar_scale[1, "fit"]
lwr_price <- pred_dollar_scale[1, "lwr"]
upr_price <- pred_dollar_scale[1, "upr"]

report_string <- sprintf(
  "\nBased on our final model, for a %s:
  \n> The single best price estimate (yhat) is $%.0f.
  \n> We can say we are 95%% confident that the listing price for a car with
  these specific specifications would fall within prediction interval of
  [$%.0f, $%.0f].",
  car_props,
  fit_price,
  lwr_price,
  upr_price
)
cat(report_string)
## 
## Based on our final model, for a toyota sedan with 100 hp, 120 enginesize, and 2500 lb curbweight:
##   
## > The single best price estimate (yhat) is $10204.
##   
## > We can say we are 95% confident that the listing price for a car with
##   these specific specifications would fall within prediction interval of
##   [$7939, $13115].

6.1.3.1 Demo: testing function

get_price_prediction_report <- function(model,
                                        hp,
                                        engine_size,
                                        curb_weight,
                                        asp,
                                        body,
                                        cyl,
                                        mfr) {
  new_car <- data.frame(
    horsepower = hp,
    enginesize = engine_size,
    curbweight = curb_weight,
    aspiration = as.factor(asp),
    carbody = as.factor(body),
    cylinderNum = as.factor(cyl),
    carManufacturer = as.factor(mfr)
  )
  pred_log_scale <- predict(model, newdata = new_car, interval = "prediction")
  pred_dollar_scale <- exp(pred_log_scale)
  car_props <- paste(
    new_car$carManufacturer, new_car$carbody,
    "with", new_car$horsepower, "hp,",
    new_car$enginesize, "enginesize, and",
    new_car$curbweight, "lb curbweight"
  )
  fit_price <- pred_dollar_scale[1, "fit"]
  lwr_price <- pred_dollar_scale[1, "lwr"]
  upr_price <- pred_dollar_scale[1, "upr"]
  report_string <- sprintf(
    "\nBased on our final model, for a %s:
    > The single best price estimate (y-hat) is $%.0f
    > 95%% CI for this specific car is [$%.0f, $%.0f]",
    car_props,
    fit_price,
    lwr_price,
    upr_price
  )

  cat(report_string)
}
# test 01: same toyota (to check :))
get_price_prediction_report(
  model = mdl.pa4,
  hp = 100,
  engine_size = 120,
  curb_weight = 2500,
  asp = "std",
  body = "sedan",
  cyl = "four",
  mfr = "toyota"
)
## 
## Based on our final model, for a toyota sedan with 100 hp, 120 enginesize, and 2500 lb curbweight:
##     > The single best price estimate (y-hat) is $10204
##     > 95% CI for this specific car is [$7939, $13115]
# test 02: porsche
get_price_prediction_report(
  model = mdl.pa4,
  hp = 200,
  engine_size = 180,
  curb_weight = 3000,
  asp = "turbo",
  body = "hardtop",
  cyl = "six",
  mfr = "porsche"
)
## 
## Based on our final model, for a porsche hardtop with 200 hp, 180 enginesize, and 3000 lb curbweight:
##     > The single best price estimate (y-hat) is $35127
##     > 95% CI for this specific car is [$26391, $46756]
# test 03: honda
get_price_prediction_report(
  model = mdl.pa4,
  hp = 150,
  engine_size = 120,
  curb_weight = 1000,
  asp = "std",
  body = "convertible",
  cyl = "five",
  mfr = "honda"
)
## 
## Based on our final model, for a honda convertible with 150 hp, 120 enginesize, and 1000 lb curbweight:
##     > The single best price estimate (y-hat) is $6048
##     > 95% CI for this specific car is [$4075, $8977]

6.1.4 Limitations

Observation 76 is a high-leverage point

  • coefficients are being skewed by this single observation
  • /model is overfit to this point

6.1.5 Suggestions for Improvements

  • We could remove this point and refit the model to see if the coefficients are stable and the model is more general
  • BE note: don’t know if we want to do this in the model or leave it and discuss here?

6.1.5.1 Attempt 03 (may not include in final version)

row_to_remove <- 76

df_fixed <- df[-row_to_remove, ]

mdl.pa5 <- lm(formula(mdl.pa4), data = df_fixed)

print(summary(mdl.pa4))
## 
## Call:
## lm(formula = log(price) ~ horsepower + enginesize + curbweight + 
##     aspiration + carbody + cylinderNum + carManufacturer, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35131 -0.06436  0.00549  0.06560  0.41121 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                7.785e+00  1.636e-01  47.580  < 2e-16 ***
## horsepower                 2.234e-03  6.681e-04   3.344 0.001013 ** 
## enginesize                 1.447e-03  8.969e-04   1.614 0.108415    
## curbweight                 5.068e-04  6.622e-05   7.653 1.37e-12 ***
## aspirationturbo            7.176e-02  3.213e-02   2.233 0.026821 *  
## carbodyhardtop            -1.666e-01  7.531e-02  -2.213 0.028256 *  
## carbodyhatchback          -2.359e-01  6.357e-02  -3.710 0.000280 ***
## carbodysedan              -1.730e-01  6.330e-02  -2.734 0.006922 ** 
## carbodywagon              -2.583e-01  6.863e-02  -3.763 0.000230 ***
## cylinderNumfour            1.188e-01  8.188e-02   1.451 0.148706    
## cylinderNumgeq_eight      -6.697e-02  1.010e-01  -0.663 0.508069    
## cylinderNumleq_three       3.681e-01  1.092e-01   3.370 0.000928 ***
## cylinderNumsix             1.245e-01  8.534e-02   1.458 0.146543    
## carManufactureraudi        2.699e-01  1.142e-01   2.364 0.019184 *  
## carManufacturerbmw         3.455e-01  9.617e-02   3.592 0.000428 ***
## carManufacturerbuick       2.512e-01  1.316e-01   1.909 0.058000 .  
## carManufacturerchevrolet  -2.239e-01  1.160e-01  -1.931 0.055195 .  
## carManufacturerdodge      -2.013e-01  9.465e-02  -2.127 0.034895 *  
## carManufacturerhonda      -9.350e-02  9.116e-02  -1.026 0.306487    
## carManufacturerisuzu      -7.672e-02  1.051e-01  -0.730 0.466217    
## carManufacturerjaguar     -1.273e-01  1.302e-01  -0.978 0.329502    
## carManufacturermazda      -4.852e-02  9.119e-02  -0.532 0.595326    
## carManufacturermercury    -9.729e-02  1.538e-01  -0.633 0.527732    
## carManufacturermitsubishi -2.475e-01  9.162e-02  -2.702 0.007593 ** 
## carManufacturernissan     -1.631e-01  8.862e-02  -1.841 0.067378 .  
## carManufacturerpeugeot    -1.536e-01  1.032e-01  -1.488 0.138565    
## carManufacturerplymouth   -2.281e-01  9.692e-02  -2.353 0.019755 *  
## carManufacturerporsche     4.238e-01  1.026e-01   4.130 5.65e-05 ***
## carManufacturerrenault    -1.577e-01  1.237e-01  -1.275 0.204105    
## carManufacturersaab        4.221e-02  9.970e-02   0.423 0.672556    
## carManufacturersubaru     -1.925e-01  9.212e-02  -2.090 0.038133 *  
## carManufacturertoyota     -1.648e-01  8.551e-02  -1.927 0.055623 .  
## carManufacturervolkswagen -6.172e-02  9.032e-02  -0.683 0.495316    
## carManufacturervolvo       1.254e-02  9.489e-02   0.132 0.894990    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.124 on 171 degrees of freedom
## Multiple R-squared:  0.9492, Adjusted R-squared:  0.9394 
## F-statistic: 96.79 on 33 and 171 DF,  p-value: < 2.2e-16
print(summary(mdl.pa5))
## 
## Call:
## lm(formula = formula(mdl.pa4), data = df_fixed)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35131 -0.06456  0.00564  0.06565  0.41121 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                7.785e+00  1.636e-01  47.580  < 2e-16 ***
## horsepower                 2.234e-03  6.681e-04   3.344 0.001013 ** 
## enginesize                 1.447e-03  8.969e-04   1.614 0.108415    
## curbweight                 5.068e-04  6.622e-05   7.653 1.37e-12 ***
## aspirationturbo            7.176e-02  3.213e-02   2.233 0.026821 *  
## carbodyhardtop            -1.666e-01  7.531e-02  -2.213 0.028256 *  
## carbodyhatchback          -2.359e-01  6.357e-02  -3.710 0.000280 ***
## carbodysedan              -1.730e-01  6.330e-02  -2.734 0.006922 ** 
## carbodywagon              -2.583e-01  6.863e-02  -3.763 0.000230 ***
## cylinderNumfour            1.188e-01  8.188e-02   1.451 0.148706    
## cylinderNumgeq_eight      -6.697e-02  1.010e-01  -0.663 0.508069    
## cylinderNumleq_three       3.681e-01  1.092e-01   3.370 0.000928 ***
## cylinderNumsix             1.245e-01  8.534e-02   1.458 0.146543    
## carManufactureraudi        2.699e-01  1.142e-01   2.364 0.019184 *  
## carManufacturerbmw         3.455e-01  9.617e-02   3.592 0.000428 ***
## carManufacturerbuick       2.512e-01  1.316e-01   1.909 0.058000 .  
## carManufacturerchevrolet  -2.239e-01  1.160e-01  -1.931 0.055195 .  
## carManufacturerdodge      -2.013e-01  9.465e-02  -2.127 0.034895 *  
## carManufacturerhonda      -9.350e-02  9.116e-02  -1.026 0.306487    
## carManufacturerisuzu      -7.672e-02  1.051e-01  -0.730 0.466217    
## carManufacturerjaguar     -1.273e-01  1.302e-01  -0.978 0.329502    
## carManufacturermazda      -4.852e-02  9.119e-02  -0.532 0.595326    
## carManufacturermitsubishi -2.475e-01  9.162e-02  -2.702 0.007593 ** 
## carManufacturernissan     -1.631e-01  8.862e-02  -1.841 0.067378 .  
## carManufacturerpeugeot    -1.536e-01  1.032e-01  -1.488 0.138565    
## carManufacturerplymouth   -2.281e-01  9.692e-02  -2.353 0.019755 *  
## carManufacturerporsche     4.238e-01  1.026e-01   4.130 5.65e-05 ***
## carManufacturerrenault    -1.577e-01  1.237e-01  -1.275 0.204105    
## carManufacturersaab        4.221e-02  9.970e-02   0.423 0.672556    
## carManufacturersubaru     -1.925e-01  9.212e-02  -2.090 0.038133 *  
## carManufacturertoyota     -1.648e-01  8.551e-02  -1.927 0.055623 .  
## carManufacturervolkswagen -6.172e-02  9.032e-02  -0.683 0.495316    
## carManufacturervolvo       1.254e-02  9.489e-02   0.132 0.894990    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.124 on 171 degrees of freedom
## Multiple R-squared:  0.9491, Adjusted R-squared:  0.9395 
## F-statistic: 99.56 on 32 and 171 DF,  p-value: < 2.2e-16
par(bg = "white")
par(mfrow = c(2, 2))
plot(mdl.pa5)

par(mfrow = c(1, 1))

cat("\n--- VIF Scores for mdl.pa5 ---\n")
## 
## --- VIF Scores for mdl.pa5 ---
print(car::vif(mdl.pa5))
##                       GVIF Df GVIF^(1/(2*Df))
## horsepower        9.108072  1        3.017958
## enginesize       18.484053  1        4.299308
## curbweight       15.727048  1        3.965734
## aspiration        1.989300  1        1.410425
## carbody           3.998363  4        1.189146
## cylinderNum      71.620842  4        1.705611
## carManufacturer 359.645296 20        1.158502

6.1.5.2 price model 6 (Ardi)

mdl.pa6 <- lm(formula = log(price) ~ aspiration +
       carbody +
       wheelbase +
       carheight + curbweight +
       enginetype + enginesize +
       fuelsystem +
       peakrpm +
       carManufacturer, data = df)


print(summary(mdl.pa6))
## 
## Call:
## lm(formula = log(price) ~ aspiration + carbody + wheelbase + 
##     carheight + curbweight + enginetype + enginesize + fuelsystem + 
##     peakrpm + carManufacturer, data = df)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.306403 -0.059466  0.002988  0.064348  0.314524 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                7.736e+00  4.631e-01  16.706  < 2e-16 ***
## aspirationturbo            1.217e-01  3.325e-02   3.660 0.000342 ***
## carbodyhardtop            -1.918e-01  7.002e-02  -2.739 0.006864 ** 
## carbodyhatchback          -2.142e-01  6.406e-02  -3.345 0.001026 ** 
## carbodysedan              -1.659e-01  6.558e-02  -2.530 0.012366 *  
## carbodywagon              -1.953e-01  7.244e-02  -2.695 0.007783 ** 
## wheelbase                  1.596e-02  4.072e-03   3.920 0.000131 ***
## carheight                 -2.896e-02  7.474e-03  -3.875 0.000155 ***
## curbweight                 5.087e-04  6.313e-05   8.058 1.69e-13 ***
## enginetypedohcv           -9.949e-02  1.669e-01  -0.596 0.552042    
## enginetypel                1.936e-01  1.560e-01   1.241 0.216369    
## enginetypeohc             -6.275e-03  5.235e-02  -0.120 0.904743    
## enginetypeohcf             3.921e-01  1.517e-01   2.585 0.010623 *  
## enginetypeohcv            -2.500e-03  6.349e-02  -0.039 0.968637    
## enginetyperotor            2.660e-01  1.343e-01   1.980 0.049415 *  
## enginesize                 1.616e-03  8.120e-04   1.990 0.048348 *  
## fuelsystem2bbl             1.297e-01  9.041e-02   1.434 0.153423    
## fuelsystem4bbl             7.168e-03  1.603e-01   0.045 0.964398    
## fuelsystemidi              1.361e-01  1.011e-01   1.346 0.180086    
## fuelsystemmfi              1.509e-01  1.564e-01   0.964 0.336281    
## fuelsystemmpfi             1.882e-01  9.376e-02   2.008 0.046361 *  
## fuelsystemspdi             1.955e-01  1.087e-01   1.798 0.073994 .  
## fuelsystemspfi             4.712e-02  1.644e-01   0.287 0.774782    
## peakrpm                    4.587e-05  3.363e-05   1.364 0.174493    
## carManufactureraudi        5.704e-02  1.068e-01   0.534 0.593966    
## carManufacturerbmw         3.136e-01  1.069e-01   2.933 0.003847 ** 
## carManufacturerbuick      -2.161e-03  1.111e-01  -0.019 0.984506    
## carManufacturerchevrolet  -2.583e-01  1.213e-01  -2.128 0.034828 *  
## carManufacturerdodge      -3.069e-01  9.983e-02  -3.074 0.002483 ** 
## carManufacturerhonda      -3.241e-02  1.234e-01  -0.263 0.793085    
## carManufacturerisuzu      -1.093e-01  1.108e-01  -0.986 0.325790    
## carManufacturerjaguar     -2.958e-01  1.297e-01  -2.281 0.023854 *  
## carManufacturermazda      -7.949e-02  9.315e-02  -0.853 0.394737    
## carManufacturermercury    -1.027e-01  1.518e-01  -0.677 0.499565    
## carManufacturermitsubishi -3.664e-01  1.008e-01  -3.635 0.000374 ***
## carManufacturernissan     -1.686e-01  8.979e-02  -1.878 0.062269 .  
## carManufacturerpeugeot    -5.217e-01  1.849e-01  -2.822 0.005383 ** 
## carManufacturerplymouth   -3.317e-01  1.021e-01  -3.250 0.001409 ** 
## carManufacturerporsche     3.314e-01  1.503e-01   2.205 0.028857 *  
## carManufacturerrenault    -2.774e-01  1.211e-01  -2.291 0.023266 *  
## carManufacturersaab        5.054e-02  1.010e-01   0.501 0.617355    
## carManufacturersubaru     -6.310e-01  1.747e-01  -3.611 0.000408 ***
## carManufacturertoyota     -2.199e-01  8.429e-02  -2.609 0.009944 ** 
## carManufacturervolkswagen -1.268e-01  9.476e-02  -1.338 0.182663    
## carManufacturervolvo      -9.875e-02  1.058e-01  -0.933 0.352221    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.113 on 160 degrees of freedom
## Multiple R-squared:  0.9606, Adjusted R-squared:  0.9497 
## F-statistic: 88.55 on 44 and 160 DF,  p-value: < 2.2e-16
cat("\n--- VIF Scores for model 6 ---\n")
## 
## --- VIF Scores for model 6 ---
print(car::vif(mdl.pa6))
##                         GVIF Df GVIF^(1/(2*Df))
## aspiration      2.625732e+00  1        1.620411
## carbody         8.802230e+00  4        1.312424
## wheelbase       9.608223e+00  1        3.099713
## carheight       5.329608e+00  1        2.308594
## curbweight      1.726727e+01  1        4.155390
## enginetype      2.026449e+04  6        2.285044
## enginesize      1.827302e+01  1        4.274695
## fuelsystem      9.183312e+02  7        1.627957
## peakrpm         4.111872e+00  1        2.027775
## carManufacturer 3.822617e+06 21        1.434574
par(bg = "white")
par(mfrow = c(2, 2))
plot(mdl.pa6)

par(mfrow = c(1, 1))
print(AIC(mdl.pa4))
## [1] -241.1283
print(AIC(mdl.pa6))
## [1] -271.0404

Ardi’s note:

Improvements over model 4:

  • Higher R^2 (.9497 > .9394)
  • Lower AIC (-271.04 < -241.13)

6.2 MPG Model

6.2.1 Attempt 01

Here we are asking: “How does fuel economy (citympg) affect a car’s price, after controlling for other characteristics?”

vars <- c(
  "price", "safetyIncr", "aspiration", "carbody", "enginelocation",
  "carwidth", "curbweight", "enginetype", "cylinderNum", "enginesize",
  "stroke", "peakrpm", "compressionratio", "carManufacturer", "citympg"
)

dfm <- df[, vars]
dfm <- dfm[complete.cases(dfm), ]
dfm[] <- lapply(dfm, function(x) if (is.factor(x)) droplevels(x) else x)

# BE note: mpg1 is not included here, but have kept naming the same for
# consistency
lm.mpg2 <- lm(
  price ~ safetyIncr + aspiration + carbody +
    enginelocation + carwidth + curbweight + enginetype +
    cylinderNum + enginesize + stroke + peakrpm + compressionratio +
    carManufacturer + citympg,
  data = dfm
)
print(summary(lm.mpg2))
## 
## Call:
## lm(formula = price ~ safetyIncr + aspiration + carbody + enginelocation + 
##     carwidth + curbweight + enginetype + cylinderNum + enginesize + 
##     stroke + peakrpm + compressionratio + carManufacturer + citympg, 
##     data = dfm)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3625.7  -913.2     0.0   877.3  8457.7 
## 
## Coefficients: (2 not defined because of singularities)
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -5.859e+04  1.172e+04  -4.998 1.52e-06 ***
## safetyIncr1                5.175e+02  7.122e+02   0.727 0.468546    
## safetyIncr2                1.436e+03  7.279e+02   1.973 0.050274 .  
## safetyIncr3                1.408e+03  7.726e+02   1.823 0.070266 .  
## safetyIncr4                9.524e+02  9.365e+02   1.017 0.310726    
## aspirationturbo            2.132e+03  5.085e+02   4.193 4.57e-05 ***
## carbodyhardtop            -2.688e+03  1.196e+03  -2.248 0.025978 *  
## carbodyhatchback          -2.997e+03  1.047e+03  -2.861 0.004790 ** 
## carbodysedan              -2.915e+03  1.094e+03  -2.665 0.008502 ** 
## carbodywagon              -3.776e+03  1.149e+03  -3.287 0.001247 ** 
## enginelocationrear         1.189e+04  3.126e+03   3.803 0.000204 ***
## carwidth                   6.679e+02  1.947e+02   3.431 0.000767 ***
## curbweight                 4.573e+00  1.350e+00   3.388 0.000888 ***
## enginetypedohcv           -3.609e+03  3.558e+03  -1.014 0.312029    
## enginetypel               -4.170e+03  1.733e+03  -2.406 0.017263 *  
## enginetypeohc              1.352e+03  1.032e+03   1.310 0.192061    
## enginetypeohcf            -3.332e+03  1.589e+03  -2.097 0.037608 *  
## enginetypeohcv            -3.126e+03  1.188e+03  -2.632 0.009325 ** 
## enginetyperotor           -3.855e+03  3.371e+03  -1.144 0.254517    
## cylinderNumfour            3.030e+03  1.373e+03   2.207 0.028763 *  
## cylinderNumgeq_eight       6.613e+03  2.208e+03   2.996 0.003181 ** 
## cylinderNumleq_three       1.382e+04  3.516e+03   3.931 0.000126 ***
## cylinderNumsix             5.441e+03  1.586e+03   3.431 0.000768 ***
## enginesize                 7.207e+01  1.670e+01   4.316 2.79e-05 ***
## stroke                    -1.416e+03  9.024e+02  -1.569 0.118617    
## peakrpm                    1.865e+00  5.866e-01   3.180 0.001773 ** 
## compressionratio          -6.940e+01  6.173e+01  -1.124 0.262612    
## carManufactureraudi        1.051e+03  2.144e+03   0.490 0.624764    
## carManufacturerbmw         3.851e+03  1.955e+03   1.969 0.050654 .  
## carManufacturerbuick       5.182e+03  2.130e+03   2.433 0.016103 *  
## carManufacturerchevrolet  -4.532e+03  2.100e+03  -2.157 0.032480 *  
## carManufacturerdodge      -4.590e+03  1.693e+03  -2.712 0.007438 ** 
## carManufacturerhonda      -4.072e+03  1.787e+03  -2.278 0.024059 *  
## carManufacturerisuzu      -2.902e+03  1.791e+03  -1.620 0.107121    
## carManufacturerjaguar     -1.256e+02  2.199e+03  -0.057 0.954545    
## carManufacturermazda      -3.229e+03  1.545e+03  -2.090 0.038204 *  
## carManufacturermercury    -5.075e+03  2.488e+03  -2.039 0.043074 *  
## carManufacturermitsubishi -5.207e+03  1.686e+03  -3.088 0.002383 ** 
## carManufacturernissan     -3.816e+03  1.521e+03  -2.508 0.013141 *  
## carManufacturerpeugeot            NA         NA      NA       NA    
## carManufacturerplymouth   -4.899e+03  1.724e+03  -2.841 0.005087 ** 
## carManufacturerporsche     2.776e+03  2.532e+03   1.096 0.274553    
## carManufacturerrenault    -5.121e+03  2.146e+03  -2.386 0.018230 *  
## carManufacturersaab       -1.211e+03  1.704e+03  -0.711 0.478210    
## carManufacturersubaru             NA         NA      NA       NA    
## carManufacturertoyota     -4.033e+03  1.437e+03  -2.806 0.005652 ** 
## carManufacturervolkswagen -2.923e+03  1.682e+03  -1.738 0.084173 .  
## carManufacturervolvo      -3.055e+03  1.862e+03  -1.641 0.102835    
## citympg                    1.076e+02  6.651e+01   1.618 0.107634    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1816 on 158 degrees of freedom
## Multiple R-squared:   0.96,  Adjusted R-squared:  0.9483 
## F-statistic: 82.35 on 46 and 158 DF,  p-value: < 2.2e-16
par(bg = "white")
par(mfrow = c(2, 2))
plot(lm.mpg2)

par(mfrow = c(1, 1))

# print(car::vif(lm.mpg2))

BE note: vif raises an error: Error in vif.default(lm.mpg2): there are aliased coefficients in the model

This (& plots) indicate:

  • Perfect collinearity - model is over specified
  • Heteroscedasticity from residuals vs fitted (general & 67, 17 and 75)
  • Leverage points (50)

Model gives p-value of 0.107634, which suggests effect of citympg is not significant, but errors stated above indicate/suggest model is not valid

6.2.1.1 Attempting to address these issues

# use log(price)
lm.mpg2log <- lm(
  log(price) ~ safetyIncr + aspiration + carbody +
    enginelocation + carwidth + curbweight + enginetype +
    cylinderNum + enginesize + stroke + peakrpm + compressionratio +
    carManufacturer + citympg,
  data = dfm
)
summary(lm.mpg2log)
## 
## Call:
## lm(formula = log(price) ~ safetyIncr + aspiration + carbody + 
##     enginelocation + carwidth + curbweight + enginetype + cylinderNum + 
##     enginesize + stroke + peakrpm + compressionratio + carManufacturer + 
##     citympg, data = dfm)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.28965 -0.06365  0.00171  0.06179  0.43451 
## 
## Coefficients: (2 not defined because of singularities)
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                5.325e+00  7.684e-01   6.930 1.01e-10 ***
## safetyIncr1               -6.262e-02  4.669e-02  -1.341 0.181786    
## safetyIncr2               -5.709e-02  4.771e-02  -1.197 0.233273    
## safetyIncr3               -2.135e-02  5.064e-02  -0.422 0.673873    
## safetyIncr4                1.902e-03  6.139e-02   0.031 0.975318    
## aspirationturbo            1.202e-01  3.333e-02   3.605 0.000418 ***
## carbodyhardtop            -1.210e-01  7.840e-02  -1.544 0.124627    
## carbodyhatchback          -1.864e-01  6.865e-02  -2.715 0.007362 ** 
## carbodysedan              -1.423e-01  7.170e-02  -1.984 0.048980 *  
## carbodywagon              -2.277e-01  7.530e-02  -3.024 0.002912 ** 
## enginelocationrear         6.313e-01  2.049e-01   3.080 0.002440 ** 
## carwidth                   4.114e-02  1.276e-02   3.224 0.001537 ** 
## curbweight                 4.517e-04  8.847e-05   5.105 9.38e-07 ***
## enginetypedohcv           -1.778e-01  2.333e-01  -0.762 0.447128    
## enginetypel               -3.669e-01  1.136e-01  -3.230 0.001507 ** 
## enginetypeohc             -2.060e-02  6.765e-02  -0.304 0.761155    
## enginetypeohcf            -3.078e-01  1.042e-01  -2.954 0.003614 ** 
## enginetypeohcv            -8.030e-02  7.784e-02  -1.032 0.303855    
## enginetyperotor           -4.276e-01  2.210e-01  -1.935 0.054761 .  
## cylinderNumfour            1.181e-01  9.000e-02   1.312 0.191259    
## cylinderNumgeq_eight       1.301e-01  1.447e-01   0.899 0.370010    
## cylinderNumleq_three       7.171e-01  2.305e-01   3.111 0.002212 ** 
## cylinderNumsix             1.295e-01  1.040e-01   1.246 0.214712    
## enginesize                 1.932e-03  1.095e-03   1.765 0.079549 .  
## stroke                    -3.219e-02  5.915e-02  -0.544 0.587038    
## peakrpm                    7.092e-05  3.846e-05   1.844 0.067016 .  
## compressionratio           1.544e-03  4.046e-03   0.381 0.703370    
## carManufactureraudi        7.558e-02  1.406e-01   0.538 0.591579    
## carManufacturerbmw         2.871e-01  1.282e-01   2.240 0.026486 *  
## carManufacturerbuick      -3.798e-02  1.396e-01  -0.272 0.786005    
## carManufacturerchevrolet  -2.375e-01  1.377e-01  -1.725 0.086454 .  
## carManufacturerdodge      -2.841e-01  1.110e-01  -2.561 0.011388 *  
## carManufacturerhonda      -2.033e-01  1.172e-01  -1.735 0.084623 .  
## carManufacturerisuzu      -1.114e-01  1.174e-01  -0.949 0.344305    
## carManufacturerjaguar     -2.321e-01  1.442e-01  -1.610 0.109381    
## carManufacturermazda      -1.627e-01  1.013e-01  -1.607 0.110028    
## carManufacturermercury    -1.597e-01  1.631e-01  -0.979 0.328936    
## carManufacturermitsubishi -3.505e-01  1.105e-01  -3.170 0.001828 ** 
## carManufacturernissan     -2.060e-01  9.972e-02  -2.066 0.040505 *  
## carManufacturerpeugeot            NA         NA      NA       NA    
## carManufacturerplymouth   -3.109e-01  1.130e-01  -2.750 0.006647 ** 
## carManufacturerporsche     1.796e-01  1.660e-01   1.082 0.280897    
## carManufacturerrenault    -2.921e-01  1.407e-01  -2.076 0.039537 *  
## carManufacturersaab       -8.446e-02  1.117e-01  -0.756 0.450590    
## carManufacturersubaru             NA         NA      NA       NA    
## carManufacturertoyota     -2.357e-01  9.422e-02  -2.501 0.013398 *  
## carManufacturervolkswagen -1.781e-01  1.102e-01  -1.616 0.108162    
## carManufacturervolvo      -1.712e-01  1.220e-01  -1.403 0.162601    
## citympg                   -4.080e-03  4.360e-03  -0.936 0.350801    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1191 on 158 degrees of freedom
## Multiple R-squared:  0.9567, Adjusted R-squared:  0.9441 
## F-statistic: 75.96 on 46 and 158 DF,  p-value: < 2.2e-16
lm.mpg2logaic <- stepAIC(lm.mpg2log, direction = "backward", trace = FALSE)
print(summary(lm.mpg2logaic))
## 
## Call:
## lm(formula = log(price) ~ aspiration + carbody + carwidth + curbweight + 
##     enginetype + enginesize + peakrpm + carManufacturer + citympg, 
##     data = dfm)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.28430 -0.06153  0.00325  0.06208  0.44761 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                5.388e+00  7.113e-01   7.575 2.33e-12 ***
## aspirationturbo            1.194e-01  2.940e-02   4.060 7.53e-05 ***
## carbodyhardtop            -1.748e-01  7.132e-02  -2.451 0.015277 *  
## carbodyhatchback          -2.149e-01  6.319e-02  -3.400 0.000842 ***
## carbodysedan              -1.650e-01  6.231e-02  -2.648 0.008870 ** 
## carbodywagon              -2.402e-01  6.640e-02  -3.618 0.000393 ***
## carwidth                   4.182e-02  1.181e-02   3.542 0.000516 ***
## curbweight                 4.432e-04  7.213e-05   6.144 5.71e-09 ***
## enginetypedohcv           -2.518e-01  1.764e-01  -1.428 0.155191    
## enginetypel                1.857e-01  1.613e-01   1.151 0.251343    
## enginetypeohc             -4.827e-02  5.443e-02  -0.887 0.376485    
## enginetypeohcf             3.230e-01  1.581e-01   2.043 0.042610 *  
## enginetypeohcv            -6.121e-02  6.296e-02  -0.972 0.332350    
## enginetyperotor            2.113e-01  9.566e-02   2.209 0.028509 *  
## enginesize                 2.087e-03  8.213e-04   2.541 0.011970 *  
## peakrpm                    5.520e-05  3.476e-05   1.588 0.114183    
## carManufactureraudi       -2.622e-02  1.142e-01  -0.230 0.818649    
## carManufacturerbmw         3.169e-01  1.103e-01   2.872 0.004612 ** 
## carManufacturerbuick      -8.363e-02  1.117e-01  -0.749 0.454916    
## carManufacturerchevrolet  -2.064e-01  1.292e-01  -1.597 0.112112    
## carManufacturerdodge      -2.694e-01  1.024e-01  -2.630 0.009325 ** 
## carManufacturerhonda      -1.795e-01  1.051e-01  -1.709 0.089367 .  
## carManufacturerisuzu      -8.925e-02  1.101e-01  -0.810 0.418946    
## carManufacturerjaguar     -2.546e-01  1.355e-01  -1.878 0.062098 .  
## carManufacturermazda      -1.488e-01  9.609e-02  -1.548 0.123488    
## carManufacturermercury    -1.556e-01  1.552e-01  -1.002 0.317595    
## carManufacturermitsubishi -3.298e-01  1.005e-01  -3.281 0.001261 ** 
## carManufacturernissan     -2.030e-01  9.345e-02  -2.172 0.031255 *  
## carManufacturerpeugeot    -5.441e-01  1.891e-01  -2.878 0.004532 ** 
## carManufacturerplymouth   -2.935e-01  1.046e-01  -2.805 0.005621 ** 
## carManufacturerporsche     2.475e-01  1.591e-01   1.556 0.121594    
## carManufacturerrenault    -3.048e-01  1.275e-01  -2.390 0.017981 *  
## carManufacturersaab       -5.038e-02  1.014e-01  -0.497 0.619957    
## carManufacturersubaru     -6.205e-01  1.849e-01  -3.356 0.000978 ***
## carManufacturertoyota     -2.127e-01  8.774e-02  -2.424 0.016412 *  
## carManufacturervolkswagen -1.725e-01  9.773e-02  -1.765 0.079457 .  
## carManufacturervolvo      -1.111e-01  1.077e-01  -1.032 0.303519    
## citympg                   -4.162e-03  3.027e-03  -1.375 0.171028    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1184 on 167 degrees of freedom
## Multiple R-squared:  0.9548, Adjusted R-squared:  0.9448 
## F-statistic: 95.36 on 37 and 167 DF,  p-value: < 2.2e-16
par(bg = "white")
par(mfrow = c(2, 2))
plot(lm.mpg2logaic)

par(mfrow = c(1, 1))
print(car::vif(lm.mpg2logaic))
##                         GVIF Df GVIF^(1/(2*Df))
## aspiration      1.870544e+00  1        1.367678
## carbody         4.724866e+00  4        1.214224
## carwidth        9.343089e+00  1        3.056647
## curbweight      2.053810e+01  1        4.531898
## enginetype      7.457299e+03  6        2.102399
## enginesize      1.702941e+01  1        4.126670
## peakrpm         4.001969e+00  1        2.000492
## carManufacturer 1.198908e+05 21        1.321063
## citympg         5.710770e+00  1        2.389722
print(anova(lm.mpg2logaic, lm.mpg2log))
## Analysis of Variance Table
## 
## Model 1: log(price) ~ aspiration + carbody + carwidth + curbweight + enginetype + 
##     enginesize + peakrpm + carManufacturer + citympg
## Model 2: log(price) ~ safetyIncr + aspiration + carbody + enginelocation + 
##     carwidth + curbweight + enginetype + cylinderNum + enginesize + 
##     stroke + peakrpm + compressionratio + carManufacturer + citympg
##   Res.Df  RSS Df Sum of Sq      F Pr(>F)
## 1    167 2.34                           
## 2    158 2.24  9  0.099951 0.7833  0.632

BE notes:

  • Heteroscedasticity is addressed (residuals vs fitted)
  • Adjusted r-squared is high (0.9448), model explains 94% of variance in the logarithm of the price
  • All VIF values are below threshold of 5 (although curbweight is slightly high)
  • Anova shows simplier model is sufficient -> this is the one we should use

Linking back to original question: “How does fuel economy (citympg) affect a car’s price, after controlling for other characteristics?”

Linking back to original question: “How does fuel economy (citympg) affect a car’s price, after controlling for other characteristics?”

Coefficients

Term Estimate Std. Error t value Pr(> t )
citympg -4.162e-03 3.027e-03 -1.375 0.171028

Answer:

  • Estimate: \(-4.163\times10^{-3}\)
  • p-value: \(0.171\)

So citympg does not have a statistically significant effect on log(price) after controlling for other factors (like curbweight, enginesize, carwidth, and carManufacturer)

Answer:

  • Estimate: \(-4.163\times10^{-3}\)
  • p-value: \(0.171\)

So citympg does not have a statistically significant effect on log(price) after controlling for other factors (like curbweight, enginesize, carwidth, and carManufacturer)

6.2.2 Attempt 02

Note here Ardi’s great model looks to address the question: ‘What car features predict a car’s fuel efficiency value?’

BE note: uses log(highwaympg/price) which represents MPG per dollar (or fuel efficiency value)

lm.mpg3 <- lm(
  formula = log(highwaympg / price) ~ carbody + enginetype +
    carwidth + curbweight + peakrpm + horsepower + carManufacturer,
  data = df
)

summary(lm.mpg3)
## 
## Call:
## lm(formula = log(highwaympg/price) ~ carbody + enginetype + carwidth + 
##     curbweight + peakrpm + horsepower + carManufacturer, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34331 -0.08570  0.00625  0.08940  0.31795 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -1.737e+00  8.655e-01  -2.007 0.046319 *  
## carbodyhardtop             2.046e-01  8.749e-02   2.339 0.020501 *  
## carbodyhatchback           2.633e-01  7.682e-02   3.428 0.000764 ***
## carbodysedan               2.191e-01  7.577e-02   2.892 0.004330 ** 
## carbodywagon               2.809e-01  8.026e-02   3.500 0.000594 ***
## enginetypedohcv            1.069e+00  2.271e-01   4.706 5.24e-06 ***
## enginetypel               -1.263e-01  1.978e-01  -0.638 0.524136    
## enginetypeohc             -7.980e-03  6.583e-02  -0.121 0.903655    
## enginetypeohcf            -9.385e-02  1.956e-01  -0.480 0.632014    
## enginetypeohcv             2.543e-02  7.151e-02   0.356 0.722621    
## enginetyperotor           -3.877e-01  1.024e-01  -3.785 0.000213 ***
## carwidth                  -2.666e-02  1.422e-02  -1.876 0.062432 .  
## curbweight                -7.130e-04  7.675e-05  -9.290  < 2e-16 ***
## peakrpm                   -7.619e-05  3.864e-05  -1.972 0.050277 .  
## horsepower                -5.866e-03  8.125e-04  -7.219 1.69e-11 ***
## carManufactureraudi       -1.322e-01  1.400e-01  -0.944 0.346274    
## carManufacturerbmw        -3.048e-01  1.291e-01  -2.362 0.019328 *  
## carManufacturerbuick      -1.186e-01  1.342e-01  -0.884 0.378130    
## carManufacturerchevrolet   3.625e-01  1.517e-01   2.390 0.017955 *  
## carManufacturerdodge       2.315e-01  1.219e-01   1.899 0.059286 .  
## carManufacturerhonda       1.909e-01  1.238e-01   1.543 0.124789    
## carManufacturerisuzu       1.151e-01  1.319e-01   0.873 0.383904    
## carManufacturerjaguar      2.716e-01  1.444e-01   1.882 0.061615 .  
## carManufacturermazda       1.028e-01  1.168e-01   0.880 0.380104    
## carManufacturermercury     2.441e-01  1.921e-01   1.271 0.205472    
## carManufacturermitsubishi  2.865e-01  1.199e-01   2.389 0.017989 *  
## carManufacturernissan      2.316e-01  1.117e-01   2.074 0.039618 *  
## carManufacturerpeugeot     3.178e-01  2.355e-01   1.350 0.178909    
## carManufacturerplymouth    2.727e-01  1.246e-01   2.189 0.029974 *  
## carManufacturerporsche    -1.621e-01  1.935e-01  -0.838 0.403178    
## carManufacturerrenault     2.260e-01  1.547e-01   1.461 0.145794    
## carManufacturersaab        7.343e-02  1.245e-01   0.590 0.556121    
## carManufacturersubaru      2.231e-01  2.294e-01   0.973 0.332175    
## carManufacturertoyota      2.229e-01  1.061e-01   2.101 0.037135 *  
## carManufacturervolkswagen  1.620e-01  1.181e-01   1.371 0.172220    
## carManufacturervolvo       9.926e-02  1.297e-01   0.765 0.445099    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1457 on 169 degrees of freedom
## Multiple R-squared:  0.9639, Adjusted R-squared:  0.9564 
## F-statistic:   129 on 35 and 169 DF,  p-value: < 2.2e-16
par(bg = "white")
par(mfrow = c(2, 2))
plot(lm.mpg3)

print(vif(lm.mpg3))
##                         GVIF Df GVIF^(1/(2*Df))
## carbody             4.260997  4        1.198640
## enginetype       4671.186671  6        2.022021
## carwidth            8.943383  1        2.990549
## curbweight         15.354936  1        3.918537
## peakrpm             3.266314  1        1.807295
## horsepower          9.926502  1        3.150635
## carManufacturer 50156.872654 21        1.293936

6.2.2.1 Problem

BE note: Is the fact that we have the message “not plotting observations with leverage one: 19, 76, 126, 130” a problem?

lm.mpg3aic <- stepAIC(lm.mpg3, direction = "backward", trace = TRUE)
## Start:  AIC=-757.45
## log(highwaympg/price) ~ carbody + enginetype + carwidth + curbweight + 
##     peakrpm + horsepower + carManufacturer
## 
##                   Df Sum of Sq    RSS     AIC
## <none>                         3.5855 -757.45
## - carwidth         1   0.07464 3.6601 -755.23
## - peakrpm          1   0.08248 3.6679 -754.79
## - carbody          4   0.32781 3.9133 -747.52
## - enginetype       6   1.16168 4.7471 -711.92
## - horsepower       1   1.10571 4.6912 -704.35
## - carManufacturer 21   2.81832 6.4038 -680.55
## - curbweight       1   1.83111 5.4166 -674.88
print(summary(lm.mpg3aic))
## 
## Call:
## lm(formula = log(highwaympg/price) ~ carbody + enginetype + carwidth + 
##     curbweight + peakrpm + horsepower + carManufacturer, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34331 -0.08570  0.00625  0.08940  0.31795 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -1.737e+00  8.655e-01  -2.007 0.046319 *  
## carbodyhardtop             2.046e-01  8.749e-02   2.339 0.020501 *  
## carbodyhatchback           2.633e-01  7.682e-02   3.428 0.000764 ***
## carbodysedan               2.191e-01  7.577e-02   2.892 0.004330 ** 
## carbodywagon               2.809e-01  8.026e-02   3.500 0.000594 ***
## enginetypedohcv            1.069e+00  2.271e-01   4.706 5.24e-06 ***
## enginetypel               -1.263e-01  1.978e-01  -0.638 0.524136    
## enginetypeohc             -7.980e-03  6.583e-02  -0.121 0.903655    
## enginetypeohcf            -9.385e-02  1.956e-01  -0.480 0.632014    
## enginetypeohcv             2.543e-02  7.151e-02   0.356 0.722621    
## enginetyperotor           -3.877e-01  1.024e-01  -3.785 0.000213 ***
## carwidth                  -2.666e-02  1.422e-02  -1.876 0.062432 .  
## curbweight                -7.130e-04  7.675e-05  -9.290  < 2e-16 ***
## peakrpm                   -7.619e-05  3.864e-05  -1.972 0.050277 .  
## horsepower                -5.866e-03  8.125e-04  -7.219 1.69e-11 ***
## carManufactureraudi       -1.322e-01  1.400e-01  -0.944 0.346274    
## carManufacturerbmw        -3.048e-01  1.291e-01  -2.362 0.019328 *  
## carManufacturerbuick      -1.186e-01  1.342e-01  -0.884 0.378130    
## carManufacturerchevrolet   3.625e-01  1.517e-01   2.390 0.017955 *  
## carManufacturerdodge       2.315e-01  1.219e-01   1.899 0.059286 .  
## carManufacturerhonda       1.909e-01  1.238e-01   1.543 0.124789    
## carManufacturerisuzu       1.151e-01  1.319e-01   0.873 0.383904    
## carManufacturerjaguar      2.716e-01  1.444e-01   1.882 0.061615 .  
## carManufacturermazda       1.028e-01  1.168e-01   0.880 0.380104    
## carManufacturermercury     2.441e-01  1.921e-01   1.271 0.205472    
## carManufacturermitsubishi  2.865e-01  1.199e-01   2.389 0.017989 *  
## carManufacturernissan      2.316e-01  1.117e-01   2.074 0.039618 *  
## carManufacturerpeugeot     3.178e-01  2.355e-01   1.350 0.178909    
## carManufacturerplymouth    2.727e-01  1.246e-01   2.189 0.029974 *  
## carManufacturerporsche    -1.621e-01  1.935e-01  -0.838 0.403178    
## carManufacturerrenault     2.260e-01  1.547e-01   1.461 0.145794    
## carManufacturersaab        7.343e-02  1.245e-01   0.590 0.556121    
## carManufacturersubaru      2.231e-01  2.294e-01   0.973 0.332175    
## carManufacturertoyota      2.229e-01  1.061e-01   2.101 0.037135 *  
## carManufacturervolkswagen  1.620e-01  1.181e-01   1.371 0.172220    
## carManufacturervolvo       9.926e-02  1.297e-01   0.765 0.445099    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1457 on 169 degrees of freedom
## Multiple R-squared:  0.9639, Adjusted R-squared:  0.9564 
## F-statistic:   129 on 35 and 169 DF,  p-value: < 2.2e-16
par(bg = "white")
par(mfrow = c(2, 2))
plot(lm.mpg3aic)

vif(lm.mpg3aic)
##                         GVIF Df GVIF^(1/(2*Df))
## carbody             4.260997  4        1.198640
## enginetype       4671.186671  6        2.022021
## carwidth            8.943383  1        2.990549
## curbweight         15.354936  1        3.918537
## peakrpm             3.266314  1        1.807295
## horsepower          9.926502  1        3.150635
## carManufacturer 50156.872654 21        1.293936

Summary

  • Model explains ~95% of variance in fuel efficiency value
  • Heteroscedasticity: residuals vs fitted looks good -> no heteroscedasticity
  • Collinearity values are all below threshold of 5

Not sure if we want to mention it’s flawed due to leverage points and then tell story.

Story is quite clear though:

  • curbweight and horsepower are the strongest predictors with significant negative coefficients -> makes sense:
    • heavier (up) -> mpg per dollar (down)
    • more powerful (horsepower up) -> mpg per dollar (down)

Rotor engine has strong negative coefficient -> indicating it has a poor fuel efficiency value -> consistent with prior knowledge

6.2.3 Limitations

TBI

6.2.4 Suggestions for Improvements

TBI